SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Larry D. Gruppen, Ph.D.
University of Michigan
From Concepts to Data:
Conceptualization,
Operationalization, and
in Educational Research
Measurement
Objectives
• Identify key research
design issues
• Wrestle with the
complexities of
educational measurement
• Explain the concepts of
reliability and validity in
educational measurement
• Apply criteria for
measurement quality
when conducting
educational research
Agenda
• A brief nod to design
• From theory to measurement
• Criteria for measurement quality
– Reliability
– Validity
• Application: analyze an article
Guiding Principles for
Scientific Research in Education
1. Question: pose significant question that can be
investigated empirically
2. Theory: link research to relevant theory
3. Methods: use methods that permit direct investigation of
the question
4. Reasoning: provide coherent, explicit chain of reasoning
5. Replicate and generalize across studies
6. Disclose research to encourage professional scrutiny and
critique
Study design
• Study design consists of:
– Your measurement method(s)
– The participants and how they are assigned
– The intervention
– The sequence and timing of measurements
and interventions
Comparison Group
• Pre-post design - compare intervention group to
itself
• Non-equivalent control group design - compare
intervention group to an existing group
• Randomized control group design - compare to
equivalent controls
Overview of Study Designs
• Symbols
– Each line represents a group.
– x = Intervention (e.g. treatment)
– O1, O2, O3…= Observation (measurement) at
Time 1, Time 2, Time 3, etc.
– R = Random assignment
Non-Experimental Designs
x O1
One-Group Posttest
x O1
Quasi-Experimental Designs
x O1
O1
Posttest-Only
Control Group
O1 x O2
One-Group
Pretest-Posttest
O1 x O2
O1 O2
Control Group
Pretest-Posttest
Experimental Designs
Posttest Only Randomized Control
Group
R x O1
R O1
R O1 x O2
R O1 O2
Randomized Control Group Pretest-
Posttest
Theory
Constructs
Operational Definition
Measurement
From Theory to Measurement
Measurement
• Measurement:
assignment of numbers
to objects or events
according to rules
• Quality: reliability and
validity
The Challenge of Educational
Measurement
• Almost all of the constructs we are interested in
are buried inside the individual
• Measurement depends on transforming these
internal states, events, capabilities, etc. into
something observable
• Making them observable may alter the thing we
are measuring
Examples of Measurement Methods
• Tests (knowledge, performance): defined
response, constructed response, simulations
• Questionnaires (attitudes, beliefs, preferences):
rating scales, checklists, open-ended responses
• Observations (performance, skills): tasks
(varying degrees of authenticity), problems, real-
world behaviors, records (documents)
Reliability
• Dependability (consistency or stability) of
measurement
• A necessary condition for validity
Types of Reliability
• Stability (produces the same results with repeated measurements
over time):
– Test-retest
– Correlation between scores at 2 times
• Equivalence/Internal Consistency (produces same results with
parallel items on alternate forms):
– Alternate forms; split-half; Kuder-Richardson; Chronbach’s alpha
– Correlation between scores on different forms; Calculate
coefficient alpha (a)
• Consistency (produces the same results with different observers or
raters):
– Inter-rater agreement
– Correlation between scores from different raters; kappa
coefficient
Validity
• Refers to the accuracy of inferences based on
data obtained from measurement
• Technically, measures aren’t valid, inferences
are
• No such thing as validity in the abstract: the key
issue is ‘valid’ for what inference
• Want to reduce systematic, non-random error
• Unreliability lowers correlations, reducing validity
claims
Conventional View of Validity
• Face validity: logical link between items and purpose—
makes sense on the surface
• Content validity: items cover the range of meaning
included in the construct or domain. Expert judgment
• Criterion validity: relationship between performance on
one measurement and performance on another (or
actual behavior) Concurrent and Predictive Correlation
coefficients
• Construct validity: directly connect measurement with
theory. Allows interpretation of empirical evidence in
terms of theoretical relationships. Based on weight of
evidence. Convergent and discriminant evidence.
Multitrait-MultiMethod Analysis (MTMM)
Unified View of Construct Validity
(Messick S, Amer Psych, 1995)
• Validity is not a property of an instrument but rather of
the meaning of the scores. Must be considered
holistically.
• 6 Aspects of Construct Validity Evidence
– Content—content relevance & representativeness
– Substantive—theoretical rationale for observed consistencies in
test responses
– Structural—fidelity of scoring structure to structure of construct
domain
– Generalizability—generalization to the population and across
populations
– External—convergent and discriminant evidence
– Consequential—intended and unintended consequences of
score interpretation; social consequence of assessment
(fairness, justice)
Finding Measurement Instruments
• Scan the engineering education literature (obviously)
• Email engineering ed researchers (use the network)
• Examine literature for instruments used in prior studies
• General education/social science instrument databases
– Buros Institute of Mental Measurements (Mental
Measurement Yearbook, Tests in Print)
http://buros.unl.edu/buros/jsp/search.jsp
– ERIC databases http://www.eric.ed.gov/
– Educational Testing Service Test Collection
http://www.ets.org/testcoll/index.html
• Construct your own (last resort!)
– Get some expert consultation (test writing, survey
design, questionnaire construction, etc.)
Example
• In your groups, analyze the Steif & Dantzler
statics concept inventory article. Look for:
– Theoretical framework
– Constructs used in the study
– How constructs were operationalized
– Measurement process
• Attention to reliability and validity
References
• Campbell DT, Stanley JC. Experimental and quasi-
experimental designs for research. Chicago: Rand
McNally; 1969.
• Cook, T.D. and Campbell, D.T. (1979). Quasi-
Experimentation: Design and Analysis for Field Settings.
Rand McNally, Chicago, Illinois.
• Messick S. Validity of psychological assessment:
validation of inferences from persons' responses and
performances as scientific inquiry into score meaning.
American Psychologist. 1995;50:741-749.
• Messick S. Validity. In: Linn RL, ed. Educational
measurement. 3rd ed. New York: American Council on
Education & Macmillan; 1989:13-103.

Weitere ähnliche Inhalte

Was ist angesagt?

Reliability (assessment of student learning I)
Reliability (assessment of student learning I)Reliability (assessment of student learning I)
Reliability (assessment of student learning I)
Rey-ra Mora
 
Lecture 3 measurement, reliability and validity (
Lecture   3 measurement, reliability and validity (Lecture   3 measurement, reliability and validity (
Lecture 3 measurement, reliability and validity (
La Islaa
 

Was ist angesagt? (20)

Validity, reliabiltiy and alignment to determine the effectiveness of assessment
Validity, reliabiltiy and alignment to determine the effectiveness of assessmentValidity, reliabiltiy and alignment to determine the effectiveness of assessment
Validity, reliabiltiy and alignment to determine the effectiveness of assessment
 
Edm 202
Edm 202Edm 202
Edm 202
 
Reliability and validity ppt
Reliability and validity pptReliability and validity ppt
Reliability and validity ppt
 
Reliability and validity- research-for BSC/PBBSC AND MSC NURSING
Reliability and validity- research-for BSC/PBBSC AND MSC NURSINGReliability and validity- research-for BSC/PBBSC AND MSC NURSING
Reliability and validity- research-for BSC/PBBSC AND MSC NURSING
 
Reliability (assessment of student learning I)
Reliability (assessment of student learning I)Reliability (assessment of student learning I)
Reliability (assessment of student learning I)
 
Understanding reliability and validity
Understanding reliability and validityUnderstanding reliability and validity
Understanding reliability and validity
 
Lecture 3 measurement, reliability and validity (
Lecture   3 measurement, reliability and validity (Lecture   3 measurement, reliability and validity (
Lecture 3 measurement, reliability and validity (
 
Validity, reliability & Internal validity in Researches
Validity, reliability & Internal validity in ResearchesValidity, reliability & Internal validity in Researches
Validity, reliability & Internal validity in Researches
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 
Characteristics of a good test
Characteristics  of a good testCharacteristics  of a good test
Characteristics of a good test
 
Presentation validity
Presentation validityPresentation validity
Presentation validity
 
Questionnaire and Instrument validity
Questionnaire and Instrument validityQuestionnaire and Instrument validity
Questionnaire and Instrument validity
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
Scale development -- Steps
Scale development -- StepsScale development -- Steps
Scale development -- Steps
 
Maryam Bolouri
Maryam BolouriMaryam Bolouri
Maryam Bolouri
 
Week 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and ReliabilityWeek 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and Reliability
 
Tools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityTools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and Reliability
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
VALIDITY
VALIDITYVALIDITY
VALIDITY
 
Validity and reliability_vipin
Validity and reliability_vipinValidity and reliability_vipin
Validity and reliability_vipin
 

Andere mochten auch (6)

Measurement
Measurement Measurement
Measurement
 
5. discussion review method
5. discussion  review method5. discussion  review method
5. discussion review method
 
6. operationalization of variables
6. operationalization of variables6. operationalization of variables
6. operationalization of variables
 
Concepts, Operationalization and Measurement
Concepts, Operationalization and MeasurementConcepts, Operationalization and Measurement
Concepts, Operationalization and Measurement
 
Conceptualizing in research : an overview
Conceptualizing in research : an overviewConceptualizing in research : an overview
Conceptualizing in research : an overview
 
Conceptualising a Research and Writing a Proposal. How to evolve a budget for...
Conceptualising a Research and Writing a Proposal. How to evolve a budget for...Conceptualising a Research and Writing a Proposal. How to evolve a budget for...
Conceptualising a Research and Writing a Proposal. How to evolve a budget for...
 

Ähnlich wie Rree measurement-larry-d3

1 Assessing the Validity of Inferences Made from Assess.docx
1  Assessing the Validity of Inferences Made from Assess.docx1  Assessing the Validity of Inferences Made from Assess.docx
1 Assessing the Validity of Inferences Made from Assess.docx
oswald1horne84988
 
QualitativeAnalysis_W2015.ppt
QualitativeAnalysis_W2015.pptQualitativeAnalysis_W2015.ppt
QualitativeAnalysis_W2015.ppt
RabinThapa27
 
Week 9 validity and reliability
Week 9 validity and reliabilityWeek 9 validity and reliability
Week 9 validity and reliability
wawaaa789
 
Introduction to business research
Introduction to business researchIntroduction to business research
Introduction to business research
soumibhattacharya3
 

Ähnlich wie Rree measurement-larry-d3 (20)

Research methodology for behavioral research
Research methodology for behavioral researchResearch methodology for behavioral research
Research methodology for behavioral research
 
Methodology and IRB/URR
Methodology and IRB/URRMethodology and IRB/URR
Methodology and IRB/URR
 
Business Research Methods Unit III
Business Research Methods Unit IIIBusiness Research Methods Unit III
Business Research Methods Unit III
 
ETHNOGRAPHY IV: Mixed Research Methods.pptx
ETHNOGRAPHY IV: Mixed Research Methods.pptxETHNOGRAPHY IV: Mixed Research Methods.pptx
ETHNOGRAPHY IV: Mixed Research Methods.pptx
 
1 Assessing the Validity of Inferences Made from Assess.docx
1  Assessing the Validity of Inferences Made from Assess.docx1  Assessing the Validity of Inferences Made from Assess.docx
1 Assessing the Validity of Inferences Made from Assess.docx
 
JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptx
 
ES_140_METHODS_OF_RESEARCH.pdf
ES_140_METHODS_OF_RESEARCH.pdfES_140_METHODS_OF_RESEARCH.pdf
ES_140_METHODS_OF_RESEARCH.pdf
 
Criteria in social research
Criteria in social researchCriteria in social research
Criteria in social research
 
Meta analysis.pptx
Meta analysis.pptxMeta analysis.pptx
Meta analysis.pptx
 
Quantitative Research
Quantitative ResearchQuantitative Research
Quantitative Research
 
QualitativeAnalysis_W2015.ppt
QualitativeAnalysis_W2015.pptQualitativeAnalysis_W2015.ppt
QualitativeAnalysis_W2015.ppt
 
Chapter 3 Quantitative Research Designs
Chapter 3 Quantitative Research DesignsChapter 3 Quantitative Research Designs
Chapter 3 Quantitative Research Designs
 
Quantitative Research
Quantitative ResearchQuantitative Research
Quantitative Research
 
Week 9 validity and reliability
Week 9 validity and reliabilityWeek 9 validity and reliability
Week 9 validity and reliability
 
Qualitative data
Qualitative dataQualitative data
Qualitative data
 
Modesofinquiry
ModesofinquiryModesofinquiry
Modesofinquiry
 
Research aptitude
Research aptitudeResearch aptitude
Research aptitude
 
Evaluation of Health IT Implementation
Evaluation of Health IT ImplementationEvaluation of Health IT Implementation
Evaluation of Health IT Implementation
 
RESEARCH IN EDUCATION
RESEARCH IN EDUCATIONRESEARCH IN EDUCATION
RESEARCH IN EDUCATION
 
Introduction to business research
Introduction to business researchIntroduction to business research
Introduction to business research
 

Kürzlich hochgeladen

Brand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdfBrand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdf
tbatkhuu1
 
BUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAIL
BUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAILBUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAIL
BUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAIL
DIGISHIFT INDIA +918368319550
 
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
anilsa9823
 

Kürzlich hochgeladen (20)

Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
 
BDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort Service
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
 
Kraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationKraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentation
 
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose GuirgisCreator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
 
Defining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerDefining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotler
 
Brand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdfBrand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdf
 
Major SEO Trends in 2024 - Banyanbrain Digital
Major SEO Trends in 2024 - Banyanbrain DigitalMajor SEO Trends in 2024 - Banyanbrain Digital
Major SEO Trends in 2024 - Banyanbrain Digital
 
How to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessHow to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail Success
 
BDSM⚡Call Girls in Sector 128 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 128 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 128 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 128 Noida Escorts >༒8448380779 Escort Service
 
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
 
Digital Strategy Master Class - Andrew Rupert
Digital Strategy Master Class - Andrew RupertDigital Strategy Master Class - Andrew Rupert
Digital Strategy Master Class - Andrew Rupert
 
Cash payment girl 9257726604 Hand ✋ to Hand over girl
Cash payment girl 9257726604 Hand ✋ to Hand over girlCash payment girl 9257726604 Hand ✋ to Hand over girl
Cash payment girl 9257726604 Hand ✋ to Hand over girl
 
Unraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptxUnraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptx
 
How to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsHow to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setups
 
Call Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRCall Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCR
 
BUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAIL
BUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAILBUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAIL
BUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAIL
 
BLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly BulletinBLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
 
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptx
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptxDigital-Marketing-Into-by-Zoraiz-Ahmad.pptx
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptx
 
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
 

Rree measurement-larry-d3

  • 1. Larry D. Gruppen, Ph.D. University of Michigan From Concepts to Data: Conceptualization, Operationalization, and in Educational Research Measurement
  • 2. Objectives • Identify key research design issues • Wrestle with the complexities of educational measurement • Explain the concepts of reliability and validity in educational measurement • Apply criteria for measurement quality when conducting educational research
  • 3. Agenda • A brief nod to design • From theory to measurement • Criteria for measurement quality – Reliability – Validity • Application: analyze an article
  • 4. Guiding Principles for Scientific Research in Education 1. Question: pose significant question that can be investigated empirically 2. Theory: link research to relevant theory 3. Methods: use methods that permit direct investigation of the question 4. Reasoning: provide coherent, explicit chain of reasoning 5. Replicate and generalize across studies 6. Disclose research to encourage professional scrutiny and critique
  • 5. Study design • Study design consists of: – Your measurement method(s) – The participants and how they are assigned – The intervention – The sequence and timing of measurements and interventions
  • 6. Comparison Group • Pre-post design - compare intervention group to itself • Non-equivalent control group design - compare intervention group to an existing group • Randomized control group design - compare to equivalent controls
  • 7. Overview of Study Designs • Symbols – Each line represents a group. – x = Intervention (e.g. treatment) – O1, O2, O3…= Observation (measurement) at Time 1, Time 2, Time 3, etc. – R = Random assignment
  • 13. O1 x O2 O1 O2 Control Group Pretest-Posttest
  • 15. Posttest Only Randomized Control Group R x O1 R O1
  • 16. R O1 x O2 R O1 O2 Randomized Control Group Pretest- Posttest
  • 18. Measurement • Measurement: assignment of numbers to objects or events according to rules • Quality: reliability and validity
  • 19. The Challenge of Educational Measurement • Almost all of the constructs we are interested in are buried inside the individual • Measurement depends on transforming these internal states, events, capabilities, etc. into something observable • Making them observable may alter the thing we are measuring
  • 20. Examples of Measurement Methods • Tests (knowledge, performance): defined response, constructed response, simulations • Questionnaires (attitudes, beliefs, preferences): rating scales, checklists, open-ended responses • Observations (performance, skills): tasks (varying degrees of authenticity), problems, real- world behaviors, records (documents)
  • 21. Reliability • Dependability (consistency or stability) of measurement • A necessary condition for validity
  • 22. Types of Reliability • Stability (produces the same results with repeated measurements over time): – Test-retest – Correlation between scores at 2 times • Equivalence/Internal Consistency (produces same results with parallel items on alternate forms): – Alternate forms; split-half; Kuder-Richardson; Chronbach’s alpha – Correlation between scores on different forms; Calculate coefficient alpha (a) • Consistency (produces the same results with different observers or raters): – Inter-rater agreement – Correlation between scores from different raters; kappa coefficient
  • 23. Validity • Refers to the accuracy of inferences based on data obtained from measurement • Technically, measures aren’t valid, inferences are • No such thing as validity in the abstract: the key issue is ‘valid’ for what inference • Want to reduce systematic, non-random error • Unreliability lowers correlations, reducing validity claims
  • 24. Conventional View of Validity • Face validity: logical link between items and purpose— makes sense on the surface • Content validity: items cover the range of meaning included in the construct or domain. Expert judgment • Criterion validity: relationship between performance on one measurement and performance on another (or actual behavior) Concurrent and Predictive Correlation coefficients • Construct validity: directly connect measurement with theory. Allows interpretation of empirical evidence in terms of theoretical relationships. Based on weight of evidence. Convergent and discriminant evidence. Multitrait-MultiMethod Analysis (MTMM)
  • 25. Unified View of Construct Validity (Messick S, Amer Psych, 1995) • Validity is not a property of an instrument but rather of the meaning of the scores. Must be considered holistically. • 6 Aspects of Construct Validity Evidence – Content—content relevance & representativeness – Substantive—theoretical rationale for observed consistencies in test responses – Structural—fidelity of scoring structure to structure of construct domain – Generalizability—generalization to the population and across populations – External—convergent and discriminant evidence – Consequential—intended and unintended consequences of score interpretation; social consequence of assessment (fairness, justice)
  • 26. Finding Measurement Instruments • Scan the engineering education literature (obviously) • Email engineering ed researchers (use the network) • Examine literature for instruments used in prior studies • General education/social science instrument databases – Buros Institute of Mental Measurements (Mental Measurement Yearbook, Tests in Print) http://buros.unl.edu/buros/jsp/search.jsp – ERIC databases http://www.eric.ed.gov/ – Educational Testing Service Test Collection http://www.ets.org/testcoll/index.html • Construct your own (last resort!) – Get some expert consultation (test writing, survey design, questionnaire construction, etc.)
  • 27. Example • In your groups, analyze the Steif & Dantzler statics concept inventory article. Look for: – Theoretical framework – Constructs used in the study – How constructs were operationalized – Measurement process • Attention to reliability and validity
  • 28. References • Campbell DT, Stanley JC. Experimental and quasi- experimental designs for research. Chicago: Rand McNally; 1969. • Cook, T.D. and Campbell, D.T. (1979). Quasi- Experimentation: Design and Analysis for Field Settings. Rand McNally, Chicago, Illinois. • Messick S. Validity of psychological assessment: validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist. 1995;50:741-749. • Messick S. Validity. In: Linn RL, ed. Educational measurement. 3rd ed. New York: American Council on Education & Macmillan; 1989:13-103.

Hinweis der Redaktion

  1. 90 minute session Steif analysis = 40 min?
  2. Learning (cognitive theory, constructivist theory, social cognitive plus some current interesting things that derived from each, like expert novice, transfer issues, ?) Motivation (probably goal theory, self-efficacy, expectancy value, self-dtermination, maybe something on negative motivation like anxiety) Developmental (probably cognitive development a la Perry, epistemological development, Baxter-Magolda etc.) Individual differences (prior knowledge, development, motivation, strategy repertoirs and self-regulation, styles, etc.)
  3. Highlight 3. Methods as the item for this session - how we get the data to permit ‘direct investigation’ Also relevant to 1. “Empirically”
  4. A study design consists of decisions about several issues and the arrangment or timing of events in the study. What you are measuring stems quite directly from the hypothesis or research question, which identifies the outcome or phenomenon of interest (learning or time use or cost, etc.). We‘ll address this is more detail in the next topic. The selection and assignment of participants also should follow from the hypothesis, but frequently, we do research on ‘convenience samples’ of whatever students we can get access to, whether they are appropriate or not. The intervention has to be defined quite clearly, both in terms of activities and timing. This is particularly true for complex educational interventions. Going back to our videotaped lecture example, we need to define whether the intervention is defined as access to videotapes of all lectures, access to those of a specific course, or to that of a specific lecture. The sequence and timing of measurements and intervention(s) is another critical decision. Measuring outcomes immediately after the intervention is most likely to show an impact, but a delayed measurement will more accurately assess how lasting the impact might be. You can, of course, do multiple measurements at various times, but all these need to be defined as part of the study design.
  5. The whole issue of randomization is the other problem that plagues most medical education studies. The most common, and often unrecognized manifestation of this is in the selection of students for the study. Not only are medical students a highly (and nonrandomly) selected population to begin with, but our studies often take students who self-select for specific educational activities or elect to participate or not participate on a non-random basis. The other problem with randomization is the one I just mentioned in the previous slide - that of non-random assignment of students. In our videotape example, we have the problem of students self-selecting to view the videotapes or not. It is feasible to imagine random assignment of students to view the tapes or not, but that creates ethical as well as pragmatic problems.
  6. Too many education researchers content themselves with a simple description of a program or an intervention or an observation, supported by some data collected from one group of students at one point in time. While this kind of research provides some useful information, the absence of a comparison group prevents us from being able to fully interpret the value of the intervention. We need to compare these results to SOMETHING and the better the quality of that ‘something,’ the better the study design. One fairly simple comparison group is the same students prior to the intervention. Although this isn‘t the strongest design, it is better than nothing and often feasible to do. Another design would be to find a comparison group that, while not entirely equivalent to the intervention group, serves as a useful point of reference. An example of this would be to compare the intervention students to students at the same point in the curriculum from previous years. We don‘t know all the ways in which the two cohorts might differ, besides the intervetion, so it isn’t problem-free, but again, it provides a useful comparison. The best design would be to randomly assign students to the control andintervention conditions. While scientifically strong, it is seldom pragmatically feasible.
  7. Strengths Useful in exploring new problems Developing ideas or devices Weaknesses No control and no internal validity No ability to make comparisons (Conclusions can only be impressionistic or imprecise)(Using historical or standardized populations not wise)
  8. Strengths No effect of pretesting Useful when pretests are unavailable, inconvenient or too expensive Also, useful when participant anonymity must be maintained Weaknesses No ability to measure of the effect of the intervention (treatment) Controls for but can not estimate the effects of maturation and history Possible selection differences (groups could be different in some fundamental way) Reactive effects of experimental procedures?
  9. Strengths Compares the performance of the same group Controls for selection (if same participants) Controls for mortality (if same participants) Weaknesses No assurance that the intervention is the only factor in the difference between O1 and O2 Threats to validity History Maturation Testing effects Statistical regression (for extreme groups) Reactive effects of experimental procedures?
  10. Strengths Good internal validity Control groups allow us to estimate the effects of History Maturation Testing effects Controls mortality effects (by checking pre and post measures) Weaknesses Possible selection differences (groups could be different in some fundamental way) Reactive effects of experimental procedures?
  11. Theory—Conceptualization by specifying precisely what we mean by a term (e.g., learning, expertise, socialization, motivation, etc.) Constructs: theoretical creations based on observations but which cannot be observed directly or indirectly. Hypothetical; abstract, defined concepts. Created by scientists. Come from theory. E.g. learning, problem solving, critical thinking, cognitive development, attribution, locus of control. Operational definition: spells out precisely how the concept will be measured - what are the variables. A description of operations that will be used to measure the concept. In education, these typically depend on some behavior on the part of the learners - answering questions on a survey, making presentations, solving problems, working in groups, etc. It must be observable [remember - “empirical”] Measurement: this critical step is central to qualitative and quantitative research. It is more apparent in quantitative research, but the issues, challenges, and decisions are analogous. We will focus on quantitative applications and examples, but keep in mind that the principles also apply to quantitative research methods. So we will spend our session today looking at principles of educational measurement.
  12. Scenario: You’ve noticed that students vary considerably in how they react to feedback in the form of grades or written evaluations. Some take any criticism as a personal attack whereas others seem to be immune to any efforts you make to tell them they need to improve their performance. Like a good educational researcher, you investigate what the literature has to say on the matter and stumble across a theoretical framework called “attribution Theory” that seems relevant. Describe attribution theory Examples of attributions: driving and someone blows their horn at you or flips you the finger - intrinsic or extrinsic attribution - my problem or his? Golf shots: good ones are due to my ability, bad ones are due to luck
  13. Scenario: You’ve noticed that students vary considerably in how they react to feedback in the form of grades or written evaluations. Some take any criticism as a personal attack whereas others seem to be immune to any efforts you make to tell them they need to improve their performance. Like a good educational researcher, you investigate what the literature has to say on the matter and stumble across a theoretical framework called “attribution Theory” that seems relevant. Describe attribution theory Examples of attributions: driving and someone blows their horn at you or flips you the finger - intrinsic or extrinsic attribution - my problem or his? Golf shots: good ones are due to my ability, bad ones are due to luck
  14. The constructs are generally internal, espcially in constructivist and cognitive theoretical frameworks. Behaviorism is attractive in that these internal states don‘t matter.
  15. Examples:Stability - administer your final exam in thermodynamics on the last day of class and re-administer it a day later to the same people. Would expect the results to be the same. If you did this on the first day of class and again on the last, you‘d expect the scores to change. Equivalence - two bathroom scales should give you the same weight in the morning. Two versions of final exam that test the same content should as well. Internal consistency - to what extent are all the items on the exam measuring the same construct - thermodynamics. If some are on thermodynamics and others on hydrodynamics, the test is not internally consistent and you should derive two scores from it rather than one.
  16. Content: determining boundaries of the construct domain. Determining the knowledge, skills, attitudes, motives and other attributes to be revealed by the measurement tasks. Addressed by means of job analysis, task analysis, curriculum analysis, domain theory. Must also attend to the representativeness of the tasks selected for assessment. Substantive: Emphasizes role of substantive theories and process modeling in identifying the domain processes to be revealed in assessment tasks. Derived from think aloud protocols, correlations patterns among part scores, modeling of task performance. Structural: Theory should not only guide selection of relevant tasks (substantive) but also the development of scoring criteria and rubrics. Generalizability: Interpretations not limited to the sample of assessed tasks but be broadly generalizable to the construct domain. External—MTMM Consequential: Social and value-related issues. Should accrue evidence of purported positive consequences. Primary issue is that any negative impact should not be derived from any source of test invalidity.
  17. Debrief in general asking for volunteers to comment on each of the four dimensions. Theory should be challenging in the sense that it is not apparent in the article.