Rree measurement-larry-d3

Larry D. Gruppen, Ph.D.
University of Michigan
From Concepts to Data:
Conceptualization,
Operationalization, and
in Educational Research
Measurement

Objectives
• Identify key research
design issues
• Wrestle with the
complexities of
educational measurement
• Explain the concepts of
reliability and validity in
educational measurement
• Apply criteria for
measurement quality
when conducting
educational research

Agenda
• A brief nod to design
• From theory to measurement
• Criteria for measurement quality
– Reliability
– Validity
• Application: analyze an article

Guiding Principles for
Scientific Research in Education
1. Question: pose significant question that can be
investigated empirically
2. Theory: link research to relevant theory
3. Methods: use methods that permit direct investigation of
the question
4. Reasoning: provide coherent, explicit chain of reasoning
5. Replicate and generalize across studies
6. Disclose research to encourage professional scrutiny and
critique

Study design
• Study design consists of:
– Your measurement method(s)
– The participants and how they are assigned
– The intervention
– The sequence and timing of measurements
and interventions

Comparison Group
• Pre-post design - compare intervention group to
itself
• Non-equivalent control group design - compare
intervention group to an existing group
• Randomized control group design - compare to
equivalent controls

Overview of Study Designs
• Symbols
– Each line represents a group.
– x = Intervention (e.g. treatment)
– O1, O2, O3…= Observation (measurement) at
Time 1, Time 2, Time 3, etc.
– R = Random assignment

x O1
O1
Posttest-Only
Control Group

O1 x O2
One-Group
Pretest-Posttest

O1 x O2
O1 O2
Control Group
Pretest-Posttest

Posttest Only Randomized Control
Group
R x O1
R O1

R O1 x O2
R O1 O2
Randomized Control Group Pretest-
Posttest

Theory
Constructs
Operational Definition
Measurement
From Theory to Measurement

Measurement
• Measurement:
assignment of numbers
to objects or events
according to rules
• Quality: reliability and
validity

The Challenge of Educational
Measurement
• Almost all of the constructs we are interested in
are buried inside the individual
• Measurement depends on transforming these
internal states, events, capabilities, etc. into
something observable
• Making them observable may alter the thing we
are measuring

Examples of Measurement Methods
• Tests (knowledge, performance): defined
response, constructed response, simulations
• Questionnaires (attitudes, beliefs, preferences):
rating scales, checklists, open-ended responses
• Observations (performance, skills): tasks
(varying degrees of authenticity), problems, real-
world behaviors, records (documents)

Reliability
• Dependability (consistency or stability) of
measurement
• A necessary condition for validity

Types of Reliability
• Stability (produces the same results with repeated measurements
over time):
– Test-retest
– Correlation between scores at 2 times
• Equivalence/Internal Consistency (produces same results with
parallel items on alternate forms):
– Alternate forms; split-half; Kuder-Richardson; Chronbach’s alpha
– Correlation between scores on different forms; Calculate
coefficient alpha (a)
• Consistency (produces the same results with different observers or
raters):
– Inter-rater agreement
– Correlation between scores from different raters; kappa
coefficient

Validity
• Refers to the accuracy of inferences based on
data obtained from measurement
• Technically, measures aren’t valid, inferences
are
• No such thing as validity in the abstract: the key
issue is ‘valid’ for what inference
• Want to reduce systematic, non-random error
• Unreliability lowers correlations, reducing validity
claims

Conventional View of Validity
• Face validity: logical link between items and purpose—
makes sense on the surface
• Content validity: items cover the range of meaning
included in the construct or domain. Expert judgment
• Criterion validity: relationship between performance on
one measurement and performance on another (or
actual behavior) Concurrent and Predictive Correlation
coefficients
• Construct validity: directly connect measurement with
theory. Allows interpretation of empirical evidence in
terms of theoretical relationships. Based on weight of
evidence. Convergent and discriminant evidence.
Multitrait-MultiMethod Analysis (MTMM)

Unified View of Construct Validity
(Messick S, Amer Psych, 1995)
• Validity is not a property of an instrument but rather of
the meaning of the scores. Must be considered
holistically.
• 6 Aspects of Construct Validity Evidence
– Content—content relevance & representativeness
– Substantive—theoretical rationale for observed consistencies in
test responses
– Structural—fidelity of scoring structure to structure of construct
domain
– Generalizability—generalization to the population and across
populations
– External—convergent and discriminant evidence
– Consequential—intended and unintended consequences of
score interpretation; social consequence of assessment
(fairness, justice)

Finding Measurement Instruments
• Scan the engineering education literature (obviously)
• Email engineering ed researchers (use the network)
• Examine literature for instruments used in prior studies
• General education/social science instrument databases
– Buros Institute of Mental Measurements (Mental
Measurement Yearbook, Tests in Print)
http://buros.unl.edu/buros/jsp/search.jsp
– ERIC databases http://www.eric.ed.gov/
– Educational Testing Service Test Collection
http://www.ets.org/testcoll/index.html
• Construct your own (last resort!)
– Get some expert consultation (test writing, survey
design, questionnaire construction, etc.)

Example
• In your groups, analyze the Steif & Dantzler
statics concept inventory article. Look for:
– Theoretical framework
– Constructs used in the study
– How constructs were operationalized
– Measurement process
• Attention to reliability and validity

References
• Campbell DT, Stanley JC. Experimental and quasi-
experimental designs for research. Chicago: Rand
McNally; 1969.
• Cook, T.D. and Campbell, D.T. (1979). Quasi-
Experimentation: Design and Analysis for Field Settings.
Rand McNally, Chicago, Illinois.
• Messick S. Validity of psychological assessment:
validation of inferences from persons' responses and
performances as scientific inquiry into score meaning.
American Psychologist. 1995;50:741-749.
• Messick S. Validity. In: Linn RL, ed. Educational
measurement. 3rd ed. New York: American Council on
Education & Macmillan; 1989:13-103.

Rree measurement-larry-d3

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Rree measurement-larry-d3

Ähnlich wie Rree measurement-larry-d3 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Rree measurement-larry-d3

Hinweis der Redaktion