SlideShare ist ein Scribd-Unternehmen logo
1 von 28
VALIDITY AND VALIDATION:
THEORIES AND PROCEDURES
125/12/2015
VALIDITY AND VALIDATION:
THEORIES AND PROCEDURES
VALIDATION TASK
To establish whether the interpretation and uses
of the VSTEP test scores were valid for measuring the
English language competence of test-takers
from level 3 to level 5 on the Vietnamese English
language competence scale
225/12/2015
To establish whether the interpretation and uses
of the VSTEP test scores were valid for measuring the
English language competence of test-takers
from level 3 to level 5 on the Vietnamese English
language competence scale
VALIDITY & VALIDATION
Validity is an integrated evaluative judgment of the degree to
which empirical evidence and theoretical rationales support the
adequacy and appropriateness of inferences and actions based
on test scores or other models of assessment.
(Messick, 1989)
325/12/2015
Validity is an integrated evaluative judgment of the degree to
which empirical evidence and theoretical rationales support the
adequacy and appropriateness of inferences and actions based
on test scores or other models of assessment.
(Messick, 1989)
Validation is to marshal evidence and arguments in support of,
or counter to, proposed interpretations and uses of test scores.
(Messick, 1989)
VALIDITY THEORIES
 1985 – The 1985 Testing Standards
 Unified concept of validity
 Construct-related evidence
 Content-related evidence
 Concurrent-related evidence
 1989 – Messick’s Validity Chapter
 Unified concept of validity
 Evidential basis (Construct, Relevance, Utility)
 Consequential basis (Values, Social Consequences)
425/12/2015
 1985 – The 1985 Testing Standards
 Unified concept of validity
 Construct-related evidence
 Content-related evidence
 Concurrent-related evidence
 1989 – Messick’s Validity Chapter
 Unified concept of validity
 Evidential basis (Construct, Relevance, Utility)
 Consequential basis (Values, Social Consequences)
MESSICK (1989)’S ASPECTS OF VALIDITY
Content
Structural
Consequential
External
Generalizability
Substantive
525/12/2015
Content
Structural
Consequential
External
Generalizability
Substantive
MESSICK (1989)’S ASPECTS OF VALIDITY
 The content aspect
 Content relevance
 Representativeness
 Technical quality
 The substantive aspect
Theoretical rationales for observed consistencies in responses
 Process of performance
 Empirical evidence of process
625/12/2015
 The content aspect
 Content relevance
 Representativeness
 Technical quality
 The substantive aspect
Theoretical rationales for observed consistencies in responses
 Process of performance
 Empirical evidence of process
MESSICK (1989)’S ASPECTS OF VALIDITY
 The structural aspect
The fidelity of the scoring structure to the construct structure.
 The generalizability aspect
The extent to which score properties and interpretations
generalize to and across groups, settings and tasks
 Reliability
 Content representativeness
725/12/2015
 The structural aspect
The fidelity of the scoring structure to the construct structure.
 The generalizability aspect
The extent to which score properties and interpretations
generalize to and across groups, settings and tasks
 Reliability
 Content representativeness
MESSICK (1989)’S ASPECTS OF VALIDITY
 The external aspect
 Convergent and discriminant evidence
 Criterion relevance
 Applied utility
 The consequential aspect
Value implications as a basis for action/consequences
 Bias
 Fairness
825/12/2015
 The external aspect
 Convergent and discriminant evidence
 Criterion relevance
 Applied utility
 The consequential aspect
Value implications as a basis for action/consequences
 Bias
 Fairness
MESSICK (1989)’S VALIDITY FRAMEWORK
 Value
 The most influential framework of validity
 Criticisms
 Abstract
 Difficult to be done by a single researcher
 No specific guidance for specific validation context
925/12/2015
 Value
 The most influential framework of validity
 Criticisms
 Abstract
 Difficult to be done by a single researcher
 No specific guidance for specific validation context
VALIDITY THEORIES
 Kane (1992)’s and (2006)’s Validity Chapter
Argument-based Approach to Validation
 Interpretive Argument
The network of inferences and assumptions
 Validity Argument
 Logical evidence
 Empirical evidence
The
Development
Stage
1025/12/2015
 Kane (1992)’s and (2006)’s Validity Chapter
Argument-based Approach to Validation
 Interpretive Argument
The network of inferences and assumptions
 Validity Argument
 Logical evidence
 Empirical evidence
The
Appraisal
Stage
KANE (1992)’S VALIDITY FRAMEWORK
 Values
 The most practical, objective framework of validity
 Unique interpretive argument, consistent validity argument
steps (Bachman, 2004)
 Criticisms
 No attention to the structural aspect (Messick, 1995)
 Inadequate attention/method to policy context and
consequences of tests (McNamara, 2006).
1125/12/2015
 Values
 The most practical, objective framework of validity
 Unique interpretive argument, consistent validity argument
steps (Bachman, 2004)
 Criticisms
 No attention to the structural aspect (Messick, 1995)
 Inadequate attention/method to policy context and
consequences of tests (McNamara, 2006).
LANGUAGE TEST VALIDATION
 Bachman (1990)’s framework, after Messick (1989)’s
 Bachman (2004)’s framework, after Kane (1992)’s
1225/12/2015
 Bachman (1990)’s framework, after Messick (1989)’s
 Bachman (2004)’s framework, after Kane (1992)’s
CHOICE OF VALIDITY FRAMEWORK
 Messick (1989)’s
 Six aspects
Content
Structural
Consequential
External
Generalizability
Substantive
1325/12/2015
Content
Structural
Consequential
External
Generalizability
Substantive
1. To what extent was the test content relevant to and
representative of the domain of English language ability?
2. To what extent was each sub-test successful in measuring
students’ English language ability?
3. How well did the test-takers’ test scores on the VSTEP
correlate with their test scores on the IELTS?
4. What were the consequences of the UEE English test
scores' interpretation and use?
VALIDATION QUESTIONS
1425/12/2015
1. To what extent was the test content relevant to and
representative of the domain of English language ability?
2. To what extent was each sub-test successful in measuring
students’ English language ability?
3. How well did the test-takers’ test scores on the VSTEP
correlate with their test scores on the IELTS?
4. What were the consequences of the UEE English test
scores' interpretation and use?
WINTERTemplate
01CONTENT
• Content relevance
• Technical quality
• Content representativeness
WINTERTemplate
RELEVANCE
• Topical content
• Typical behavior
• Underlying process
• Test specifications
01CONTENT
RELEVANCE
• Topical content
• Typical behavior
• Underlying process
• Test specifications
WINTERTemplate
01CONTENT
TECHNICAL QUALITY
Empirical Evidence
• difficulty level
• discriminating power
Expert Judgment
• readability level
• freedom of ambiguity/irrelevancy
• appropriateness of keyed answers & distractors
TECHNICAL QUALITY
Empirical Evidence
• difficulty level
• discriminating power
Expert Judgment
• readability level
• freedom of ambiguity/irrelevancy
• appropriateness of keyed answers & distractors
WINTERTemplate
REPRESENTATIVENESS
The breadth of the content specifications for a test should
reflect the breadth of the construct invoked in score
interpretation” (Messick, 1989, p. 35).
All essential components of the construct domain are
covered (Messick, 1994, p. 12).
01CONTENT
REPRESENTATIVENESS
The breadth of the content specifications for a test should
reflect the breadth of the construct invoked in score
interpretation” (Messick, 1989, p. 35).
All essential components of the construct domain are
covered (Messick, 1994, p. 12).
WINTERTemplate
01CONTENT
CONTENT ANALYSIS BY EXPERTS
• What knowledge and skills are needed to do each
item correctly?
• How relevant are the items to their assigned
objectives and domain?
Domain
• English secondary school curricula
• English program at the college
CONTENT ANALYSIS BY EXPERTS
• What knowledge and skills are needed to do each
item correctly?
• How relevant are the items to their assigned
objectives and domain?
Domain
• English secondary school curricula
• English program at the college
WINTERTemplate
01CONTENT
RASCH ANALYSIS
Item fit statistics
WINTERTemplate
01CONTENT
Item fit statistics
Smith (2004) suggested using item fit statistics to evaluate the
extent to which items tap into the same construct and place
test-takers in the same order.
- the extent to which the use of each item is consistent with the
way people have responded to the other items
- does the item rank order the individuals in a manner similar to
other items? (p. 106)
Smith (2004) argued that test-takers should be ranked
consistently by items measuring the same construct. If not, the
misfitting items to the Rasch model, i.e. the items that measure
a different construct, should be subject to revision or elimination
(p. 107).
Item fit statistics
Smith (2004) suggested using item fit statistics to evaluate the
extent to which items tap into the same construct and place
test-takers in the same order.
- the extent to which the use of each item is consistent with the
way people have responded to the other items
- does the item rank order the individuals in a manner similar to
other items? (p. 106)
Smith (2004) argued that test-takers should be ranked
consistently by items measuring the same construct. If not, the
misfitting items to the Rasch model, i.e. the items that measure
a different construct, should be subject to revision or elimination
(p. 107).
To what extent was the VSTEP sub-tests successful in
measuring students’ English language competence?
ITEM RESPONSE THEORY (RASCH MODEL)
item fit
item discrimination
item cluster
DISCRIPTIVE STATISTICS
choice response analysis
02SUBSTANTIVE & STRUCTURAL
25/12/2015 22
To what extent was the VSTEP sub-tests successful in
measuring students’ English language competence?
ITEM RESPONSE THEORY (RASCH MODEL)
item fit
item discrimination
item cluster
DISCRIPTIVE STATISTICS
choice response analysis
How well did the test-takers’ VSTEP overall and
sub-test scores correlate with the test-takers’
overall and sub-test IELTS scores?
03CRITERION-RELATED
25/12/2015 23
04
• The value implications of score interpretation
• The actual and potential consequences of score
uses
(Messick, 1989)
FOCUS: on validity of test score interpretation and
use - construct under-representation or construct-
irrelevant variance
CONSEQUENCES
25/12/2015 24
• The value implications of score interpretation
• The actual and potential consequences of score
uses
(Messick, 1989)
FOCUS: on validity of test score interpretation and
use - construct under-representation or construct-
irrelevant variance
04
Sources of evidence
• Content relevance and representativeness
• Item bias
• Technical quality of the test
• Expert judgment
CONSEQUENCES
25/12/2015 25
Sources of evidence
• Content relevance and representativeness
• Item bias
• Technical quality of the test
• Expert judgment
References
 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards
for Educational and Psychological Testing. Washington, DC: Authors.
 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards
for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
 Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of
Employment, Education and Training [and] Higher Education Council.
 Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
 Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
 Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins
University Press.
 Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.
 Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.
 Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.
 Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on
Education/Praeger.
 Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.
 McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.
 Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.
 MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.
 Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.
 Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
 Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge
University Press.
 Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R.
M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.
 Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian
Council for Educational Research.
2625/12/2015
 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards
for Educational and Psychological Testing. Washington, DC: Authors.
 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards
for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
 Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of
Employment, Education and Training [and] Higher Education Council.
 Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
 Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
 Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins
University Press.
 Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.
 Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.
 Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.
 Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on
Education/Praeger.
 Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.
 McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.
 Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.
 MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.
 Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.
 Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
 Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge
University Press.
 Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R.
M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.
 Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian
Council for Educational Research.
THANK YOU
FOR YOUR ATTENTION
2725/12/2015
THANK YOU
FOR YOUR ATTENTION
Q & A
2825/12/2015

Weitere ähnliche Inhalte

Was ist angesagt?

Presentation validity
Presentation validityPresentation validity
Presentation validityAshMusavi
 
Validity, reliability & practicality
Validity, reliability & practicalityValidity, reliability & practicality
Validity, reliability & practicalitySamcruz5
 
validity its types and importance
validity its types and importancevalidity its types and importance
validity its types and importanceIerine Joy Caserial
 
Presentation on validity and reliability in research methods
Presentation on validity and reliability in research methodsPresentation on validity and reliability in research methods
Presentation on validity and reliability in research methodsMehwish Iqbal
 
Validity in psychological testing
Validity in psychological testingValidity in psychological testing
Validity in psychological testingMilen Ramos
 
Reliability and validity w3
Reliability and validity w3Reliability and validity w3
Reliability and validity w3Muhammad Ali
 
Content &statistical validity
Content &statistical validityContent &statistical validity
Content &statistical validityAMU
 
Reliability and validity ppt
Reliability and validity pptReliability and validity ppt
Reliability and validity pptsurendra poudel
 
Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminarmrikara185
 
15th batch NPTI Validity & Reliablity Business Research Methods
15th batch NPTI Validity & Reliablity Business Research Methods 15th batch NPTI Validity & Reliablity Business Research Methods
15th batch NPTI Validity & Reliablity Business Research Methods Ravi Pohani
 
Validity, reliability and feasibility
Validity, reliability and feasibilityValidity, reliability and feasibility
Validity, reliability and feasibilitysilpa $H!lu
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validityKaimrc_Rss_Jd
 
Tools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityTools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityDr. Sarita Anand
 
Validity, reliabiltiy and alignment to determine the effectiveness of assessment
Validity, reliabiltiy and alignment to determine the effectiveness of assessmentValidity, reliabiltiy and alignment to determine the effectiveness of assessment
Validity, reliabiltiy and alignment to determine the effectiveness of assessmentMirea Mizushima
 

Was ist angesagt? (20)

Presentation validity
Presentation validityPresentation validity
Presentation validity
 
Validation
ValidationValidation
Validation
 
Validity, reliability & practicality
Validity, reliability & practicalityValidity, reliability & practicality
Validity, reliability & practicality
 
validity its types and importance
validity its types and importancevalidity its types and importance
validity its types and importance
 
Presentation on validity and reliability in research methods
Presentation on validity and reliability in research methodsPresentation on validity and reliability in research methods
Presentation on validity and reliability in research methods
 
Rep
RepRep
Rep
 
Validity in psychological testing
Validity in psychological testingValidity in psychological testing
Validity in psychological testing
 
Validity in Assessment
Validity in AssessmentValidity in Assessment
Validity in Assessment
 
Validity & Reliability
Validity & ReliabilityValidity & Reliability
Validity & Reliability
 
Reliability and validity w3
Reliability and validity w3Reliability and validity w3
Reliability and validity w3
 
Reliablity and Validity
Reliablity and ValidityReliablity and Validity
Reliablity and Validity
 
Content &statistical validity
Content &statistical validityContent &statistical validity
Content &statistical validity
 
Validity
ValidityValidity
Validity
 
Reliability and validity ppt
Reliability and validity pptReliability and validity ppt
Reliability and validity ppt
 
Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminar
 
15th batch NPTI Validity & Reliablity Business Research Methods
15th batch NPTI Validity & Reliablity Business Research Methods 15th batch NPTI Validity & Reliablity Business Research Methods
15th batch NPTI Validity & Reliablity Business Research Methods
 
Validity, reliability and feasibility
Validity, reliability and feasibilityValidity, reliability and feasibility
Validity, reliability and feasibility
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Tools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityTools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and Reliability
 
Validity, reliabiltiy and alignment to determine the effectiveness of assessment
Validity, reliabiltiy and alignment to determine the effectiveness of assessmentValidity, reliabiltiy and alignment to determine the effectiveness of assessment
Validity, reliabiltiy and alignment to determine the effectiveness of assessment
 

Andere mochten auch

Ail apresentation(kumazawa)
Ail apresentation(kumazawa)Ail apresentation(kumazawa)
Ail apresentation(kumazawa)TakaKumazawa
 
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...Peering through the Looking Glass: Towards a Programmatic View of the Qualify...
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...MedCouncilCan
 
Table of specifications 2013 copy
Table of specifications 2013   copyTable of specifications 2013   copy
Table of specifications 2013 copyMarciano Melchor
 
Why Process Measures Are Often More Important Than Outcome Measures in Health...
Why Process Measures Are Often More Important Than Outcome Measures in Health...Why Process Measures Are Often More Important Than Outcome Measures in Health...
Why Process Measures Are Often More Important Than Outcome Measures in Health...Health Catalyst
 

Andere mochten auch (6)

Ail apresentation(kumazawa)
Ail apresentation(kumazawa)Ail apresentation(kumazawa)
Ail apresentation(kumazawa)
 
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...Peering through the Looking Glass: Towards a Programmatic View of the Qualify...
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...
 
Language testing the social dimension
Language testing  the social dimensionLanguage testing  the social dimension
Language testing the social dimension
 
Table of specifications 2013 copy
Table of specifications 2013   copyTable of specifications 2013   copy
Table of specifications 2013 copy
 
Why Process Measures Are Often More Important Than Outcome Measures in Health...
Why Process Measures Are Often More Important Than Outcome Measures in Health...Why Process Measures Are Often More Important Than Outcome Measures in Health...
Why Process Measures Are Often More Important Than Outcome Measures in Health...
 
Table of specifications
Table of specificationsTable of specifications
Table of specifications
 

Ähnlich wie Validity and Validation Theories and Procedures

reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234MajaAiraBumatay
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnairesVenkitachalam R
 
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptxCopie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptxMonsefJraid
 
Principles of Language Assessment
Principles of Language AssessmentPrinciples of Language Assessment
Principles of Language Assessmentisacaiza82
 
Designing classsroom
Designing classsroomDesigning classsroom
Designing classsroomdesfi ceriany
 
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdfHND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdfMohammedAskar22
 
Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)Alfi Suru
 
Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)Alfi Suru
 
JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxsaurami
 
Language Testing : Principles of language assessment
Language Testing : Principles of language assessment Language Testing : Principles of language assessment
Language Testing : Principles of language assessment Yulia Eolia
 
NQC Presentation On Validation And Moderation
NQC Presentation On Validation And ModerationNQC Presentation On Validation And Moderation
NQC Presentation On Validation And ModerationKathleen Zarubin
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliabilitysongoten77
 
Item development.pdf for national examination development
Item development.pdf for national examination developmentItem development.pdf for national examination development
Item development.pdf for national examination developmentGalataaAGoobanaa
 

Ähnlich wie Validity and Validation Theories and Procedures (20)

reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptxCopie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
 
Principles of Language Assessment
Principles of Language AssessmentPrinciples of Language Assessment
Principles of Language Assessment
 
Designing classsroom
Designing classsroomDesigning classsroom
Designing classsroom
 
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdfHND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
 
Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)
 
Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)
 
Test construction
Test constructionTest construction
Test construction
 
Qualitative Research Methods
Qualitative Research MethodsQualitative Research Methods
Qualitative Research Methods
 
JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptx
 
The Components of Test Specifications
The Components of Test SpecificationsThe Components of Test Specifications
The Components of Test Specifications
 
Language Testing : Principles of language assessment
Language Testing : Principles of language assessment Language Testing : Principles of language assessment
Language Testing : Principles of language assessment
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
Validity
ValidityValidity
Validity
 
CRITERIA OF A GOOD TEST.pptx
CRITERIA OF A GOOD TEST.pptxCRITERIA OF A GOOD TEST.pptx
CRITERIA OF A GOOD TEST.pptx
 
NQC Presentation On Validation And Moderation
NQC Presentation On Validation And ModerationNQC Presentation On Validation And Moderation
NQC Presentation On Validation And Moderation
 
Intro assessmentcmm
Intro assessmentcmmIntro assessmentcmm
Intro assessmentcmm
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
 
Item development.pdf for national examination development
Item development.pdf for national examination developmentItem development.pdf for national examination development
Item development.pdf for national examination development
 

Mehr von englishonecfl

Chương trình và nội dung hội nghị Mạc tộc lần thứ II
Chương trình và nội dung hội nghị Mạc tộc lần thứ IIChương trình và nội dung hội nghị Mạc tộc lần thứ II
Chương trình và nội dung hội nghị Mạc tộc lần thứ IIenglishonecfl
 
Basic pronunciation online in Moodle 25.08.2016
Basic pronunciation online in Moodle 25.08.2016Basic pronunciation online in Moodle 25.08.2016
Basic pronunciation online in Moodle 25.08.2016englishonecfl
 
Reading 2 - test specification for writing test - vstep
Reading 2 - test specification for writing test - vstepReading 2 - test specification for writing test - vstep
Reading 2 - test specification for writing test - vstepenglishonecfl
 
Reading 2 guideline for item writing writing test
Reading 2 guideline for item writing writing testReading 2 guideline for item writing writing test
Reading 2 guideline for item writing writing testenglishonecfl
 
Reading 1 guidelines for designing writing prompts
Reading 1 guidelines for designing writing promptsReading 1 guidelines for designing writing prompts
Reading 1 guidelines for designing writing promptsenglishonecfl
 
Guiding questions for reading materials
Guiding questions for reading materialsGuiding questions for reading materials
Guiding questions for reading materialsenglishonecfl
 
Listening item submission template
Listening item submission templateListening item submission template
Listening item submission templateenglishonecfl
 
Writing good multiple choice test questions
Writing good multiple choice test questionsWriting good multiple choice test questions
Writing good multiple choice test questionsenglishonecfl
 
Nghe slide - testing listening skill slides
Nghe   slide - testing listening skill slidesNghe   slide - testing listening skill slides
Nghe slide - testing listening skill slidesenglishonecfl
 
Vstep listening item writer
Vstep listening item writerVstep listening item writer
Vstep listening item writerenglishonecfl
 
Tham chiếu khung cefr của các bài thi
Tham chiếu khung cefr của các  bài thiTham chiếu khung cefr của các  bài thi
Tham chiếu khung cefr của các bài thienglishonecfl
 
Online version 20151003 main issues in language testing
Online version 20151003 main issues in language   testingOnline version 20151003 main issues in language   testing
Online version 20151003 main issues in language testingenglishonecfl
 
Ke hoach to chuc bd can bo ra de thi 2015
Ke hoach to chuc bd can bo ra de thi   2015Ke hoach to chuc bd can bo ra de thi   2015
Ke hoach to chuc bd can bo ra de thi 2015englishonecfl
 
Khung chtr của 2 hợp phần
Khung chtr của 2 hợp phầnKhung chtr của 2 hợp phần
Khung chtr của 2 hợp phầnenglishonecfl
 

Mehr von englishonecfl (20)

Chương trình và nội dung hội nghị Mạc tộc lần thứ II
Chương trình và nội dung hội nghị Mạc tộc lần thứ IIChương trình và nội dung hội nghị Mạc tộc lần thứ II
Chương trình và nội dung hội nghị Mạc tộc lần thứ II
 
Basic pronunciation online in Moodle 25.08.2016
Basic pronunciation online in Moodle 25.08.2016Basic pronunciation online in Moodle 25.08.2016
Basic pronunciation online in Moodle 25.08.2016
 
Assessing speaking
Assessing speakingAssessing speaking
Assessing speaking
 
Reading 2 - test specification for writing test - vstep
Reading 2 - test specification for writing test - vstepReading 2 - test specification for writing test - vstep
Reading 2 - test specification for writing test - vstep
 
Reading 2 guideline for item writing writing test
Reading 2 guideline for item writing writing testReading 2 guideline for item writing writing test
Reading 2 guideline for item writing writing test
 
Reading 1 guidelines for designing writing prompts
Reading 1 guidelines for designing writing promptsReading 1 guidelines for designing writing prompts
Reading 1 guidelines for designing writing prompts
 
Guiding questions for reading materials
Guiding questions for reading materialsGuiding questions for reading materials
Guiding questions for reading materials
 
Listening item submission template
Listening item submission templateListening item submission template
Listening item submission template
 
Examining reading
Examining readingExamining reading
Examining reading
 
Reading
ReadingReading
Reading
 
Reading
ReadingReading
Reading
 
Writing good multiple choice test questions
Writing good multiple choice test questionsWriting good multiple choice test questions
Writing good multiple choice test questions
 
Reading
ReadingReading
Reading
 
Nghe slide - testing listening skill slides
Nghe   slide - testing listening skill slidesNghe   slide - testing listening skill slides
Nghe slide - testing listening skill slides
 
Vstep listening item writer
Vstep listening item writerVstep listening item writer
Vstep listening item writer
 
Tham chiếu khung cefr của các bài thi
Tham chiếu khung cefr của các  bài thiTham chiếu khung cefr của các  bài thi
Tham chiếu khung cefr của các bài thi
 
Online version 20151003 main issues in language testing
Online version 20151003 main issues in language   testingOnline version 20151003 main issues in language   testing
Online version 20151003 main issues in language testing
 
Ke hoach to chuc bd can bo ra de thi 2015
Ke hoach to chuc bd can bo ra de thi   2015Ke hoach to chuc bd can bo ra de thi   2015
Ke hoach to chuc bd can bo ra de thi 2015
 
Khung chtr của 2 hợp phần
Khung chtr của 2 hợp phầnKhung chtr của 2 hợp phần
Khung chtr của 2 hợp phần
 
Google Forms
Google FormsGoogle Forms
Google Forms
 

Kürzlich hochgeladen

ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 

Kürzlich hochgeladen (20)

Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 

Validity and Validation Theories and Procedures

  • 1. VALIDITY AND VALIDATION: THEORIES AND PROCEDURES 125/12/2015 VALIDITY AND VALIDATION: THEORIES AND PROCEDURES
  • 2. VALIDATION TASK To establish whether the interpretation and uses of the VSTEP test scores were valid for measuring the English language competence of test-takers from level 3 to level 5 on the Vietnamese English language competence scale 225/12/2015 To establish whether the interpretation and uses of the VSTEP test scores were valid for measuring the English language competence of test-takers from level 3 to level 5 on the Vietnamese English language competence scale
  • 3. VALIDITY & VALIDATION Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other models of assessment. (Messick, 1989) 325/12/2015 Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other models of assessment. (Messick, 1989) Validation is to marshal evidence and arguments in support of, or counter to, proposed interpretations and uses of test scores. (Messick, 1989)
  • 4. VALIDITY THEORIES  1985 – The 1985 Testing Standards  Unified concept of validity  Construct-related evidence  Content-related evidence  Concurrent-related evidence  1989 – Messick’s Validity Chapter  Unified concept of validity  Evidential basis (Construct, Relevance, Utility)  Consequential basis (Values, Social Consequences) 425/12/2015  1985 – The 1985 Testing Standards  Unified concept of validity  Construct-related evidence  Content-related evidence  Concurrent-related evidence  1989 – Messick’s Validity Chapter  Unified concept of validity  Evidential basis (Construct, Relevance, Utility)  Consequential basis (Values, Social Consequences)
  • 5. MESSICK (1989)’S ASPECTS OF VALIDITY Content Structural Consequential External Generalizability Substantive 525/12/2015 Content Structural Consequential External Generalizability Substantive
  • 6. MESSICK (1989)’S ASPECTS OF VALIDITY  The content aspect  Content relevance  Representativeness  Technical quality  The substantive aspect Theoretical rationales for observed consistencies in responses  Process of performance  Empirical evidence of process 625/12/2015  The content aspect  Content relevance  Representativeness  Technical quality  The substantive aspect Theoretical rationales for observed consistencies in responses  Process of performance  Empirical evidence of process
  • 7. MESSICK (1989)’S ASPECTS OF VALIDITY  The structural aspect The fidelity of the scoring structure to the construct structure.  The generalizability aspect The extent to which score properties and interpretations generalize to and across groups, settings and tasks  Reliability  Content representativeness 725/12/2015  The structural aspect The fidelity of the scoring structure to the construct structure.  The generalizability aspect The extent to which score properties and interpretations generalize to and across groups, settings and tasks  Reliability  Content representativeness
  • 8. MESSICK (1989)’S ASPECTS OF VALIDITY  The external aspect  Convergent and discriminant evidence  Criterion relevance  Applied utility  The consequential aspect Value implications as a basis for action/consequences  Bias  Fairness 825/12/2015  The external aspect  Convergent and discriminant evidence  Criterion relevance  Applied utility  The consequential aspect Value implications as a basis for action/consequences  Bias  Fairness
  • 9. MESSICK (1989)’S VALIDITY FRAMEWORK  Value  The most influential framework of validity  Criticisms  Abstract  Difficult to be done by a single researcher  No specific guidance for specific validation context 925/12/2015  Value  The most influential framework of validity  Criticisms  Abstract  Difficult to be done by a single researcher  No specific guidance for specific validation context
  • 10. VALIDITY THEORIES  Kane (1992)’s and (2006)’s Validity Chapter Argument-based Approach to Validation  Interpretive Argument The network of inferences and assumptions  Validity Argument  Logical evidence  Empirical evidence The Development Stage 1025/12/2015  Kane (1992)’s and (2006)’s Validity Chapter Argument-based Approach to Validation  Interpretive Argument The network of inferences and assumptions  Validity Argument  Logical evidence  Empirical evidence The Appraisal Stage
  • 11. KANE (1992)’S VALIDITY FRAMEWORK  Values  The most practical, objective framework of validity  Unique interpretive argument, consistent validity argument steps (Bachman, 2004)  Criticisms  No attention to the structural aspect (Messick, 1995)  Inadequate attention/method to policy context and consequences of tests (McNamara, 2006). 1125/12/2015  Values  The most practical, objective framework of validity  Unique interpretive argument, consistent validity argument steps (Bachman, 2004)  Criticisms  No attention to the structural aspect (Messick, 1995)  Inadequate attention/method to policy context and consequences of tests (McNamara, 2006).
  • 12. LANGUAGE TEST VALIDATION  Bachman (1990)’s framework, after Messick (1989)’s  Bachman (2004)’s framework, after Kane (1992)’s 1225/12/2015  Bachman (1990)’s framework, after Messick (1989)’s  Bachman (2004)’s framework, after Kane (1992)’s
  • 13. CHOICE OF VALIDITY FRAMEWORK  Messick (1989)’s  Six aspects Content Structural Consequential External Generalizability Substantive 1325/12/2015 Content Structural Consequential External Generalizability Substantive
  • 14. 1. To what extent was the test content relevant to and representative of the domain of English language ability? 2. To what extent was each sub-test successful in measuring students’ English language ability? 3. How well did the test-takers’ test scores on the VSTEP correlate with their test scores on the IELTS? 4. What were the consequences of the UEE English test scores' interpretation and use? VALIDATION QUESTIONS 1425/12/2015 1. To what extent was the test content relevant to and representative of the domain of English language ability? 2. To what extent was each sub-test successful in measuring students’ English language ability? 3. How well did the test-takers’ test scores on the VSTEP correlate with their test scores on the IELTS? 4. What were the consequences of the UEE English test scores' interpretation and use?
  • 15. WINTERTemplate 01CONTENT • Content relevance • Technical quality • Content representativeness
  • 16. WINTERTemplate RELEVANCE • Topical content • Typical behavior • Underlying process • Test specifications 01CONTENT RELEVANCE • Topical content • Typical behavior • Underlying process • Test specifications
  • 17. WINTERTemplate 01CONTENT TECHNICAL QUALITY Empirical Evidence • difficulty level • discriminating power Expert Judgment • readability level • freedom of ambiguity/irrelevancy • appropriateness of keyed answers & distractors TECHNICAL QUALITY Empirical Evidence • difficulty level • discriminating power Expert Judgment • readability level • freedom of ambiguity/irrelevancy • appropriateness of keyed answers & distractors
  • 18. WINTERTemplate REPRESENTATIVENESS The breadth of the content specifications for a test should reflect the breadth of the construct invoked in score interpretation” (Messick, 1989, p. 35). All essential components of the construct domain are covered (Messick, 1994, p. 12). 01CONTENT REPRESENTATIVENESS The breadth of the content specifications for a test should reflect the breadth of the construct invoked in score interpretation” (Messick, 1989, p. 35). All essential components of the construct domain are covered (Messick, 1994, p. 12).
  • 19. WINTERTemplate 01CONTENT CONTENT ANALYSIS BY EXPERTS • What knowledge and skills are needed to do each item correctly? • How relevant are the items to their assigned objectives and domain? Domain • English secondary school curricula • English program at the college CONTENT ANALYSIS BY EXPERTS • What knowledge and skills are needed to do each item correctly? • How relevant are the items to their assigned objectives and domain? Domain • English secondary school curricula • English program at the college
  • 21. WINTERTemplate 01CONTENT Item fit statistics Smith (2004) suggested using item fit statistics to evaluate the extent to which items tap into the same construct and place test-takers in the same order. - the extent to which the use of each item is consistent with the way people have responded to the other items - does the item rank order the individuals in a manner similar to other items? (p. 106) Smith (2004) argued that test-takers should be ranked consistently by items measuring the same construct. If not, the misfitting items to the Rasch model, i.e. the items that measure a different construct, should be subject to revision or elimination (p. 107). Item fit statistics Smith (2004) suggested using item fit statistics to evaluate the extent to which items tap into the same construct and place test-takers in the same order. - the extent to which the use of each item is consistent with the way people have responded to the other items - does the item rank order the individuals in a manner similar to other items? (p. 106) Smith (2004) argued that test-takers should be ranked consistently by items measuring the same construct. If not, the misfitting items to the Rasch model, i.e. the items that measure a different construct, should be subject to revision or elimination (p. 107).
  • 22. To what extent was the VSTEP sub-tests successful in measuring students’ English language competence? ITEM RESPONSE THEORY (RASCH MODEL) item fit item discrimination item cluster DISCRIPTIVE STATISTICS choice response analysis 02SUBSTANTIVE & STRUCTURAL 25/12/2015 22 To what extent was the VSTEP sub-tests successful in measuring students’ English language competence? ITEM RESPONSE THEORY (RASCH MODEL) item fit item discrimination item cluster DISCRIPTIVE STATISTICS choice response analysis
  • 23. How well did the test-takers’ VSTEP overall and sub-test scores correlate with the test-takers’ overall and sub-test IELTS scores? 03CRITERION-RELATED 25/12/2015 23
  • 24. 04 • The value implications of score interpretation • The actual and potential consequences of score uses (Messick, 1989) FOCUS: on validity of test score interpretation and use - construct under-representation or construct- irrelevant variance CONSEQUENCES 25/12/2015 24 • The value implications of score interpretation • The actual and potential consequences of score uses (Messick, 1989) FOCUS: on validity of test score interpretation and use - construct under-representation or construct- irrelevant variance
  • 25. 04 Sources of evidence • Content relevance and representativeness • Item bias • Technical quality of the test • Expert judgment CONSEQUENCES 25/12/2015 25 Sources of evidence • Content relevance and representativeness • Item bias • Technical quality of the test • Expert judgment
  • 26. References  American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards for Educational and Psychological Testing. Washington, DC: Authors.  American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.  Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of Employment, Education and Training [and] Higher Education Council.  Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.  Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.  Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins University Press.  Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.  Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.  Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.  Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on Education/Praeger.  Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.  McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.  Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.  MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.  Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.  Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.  Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge University Press.  Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R. M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.  Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian Council for Educational Research. 2625/12/2015  American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards for Educational and Psychological Testing. Washington, DC: Authors.  American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.  Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of Employment, Education and Training [and] Higher Education Council.  Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.  Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.  Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins University Press.  Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.  Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.  Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.  Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on Education/Praeger.  Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.  McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.  Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.  MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.  Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.  Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.  Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge University Press.  Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R. M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.  Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian Council for Educational Research.
  • 27. THANK YOU FOR YOUR ATTENTION 2725/12/2015 THANK YOU FOR YOUR ATTENTION