SlideShare a Scribd company logo
1 of 62
For most people, test
scores are an important
fact of life. But what
makes those numbers so
meaningful?
What are Test
Scores?
● expressed as
numbers
● used to describe,
make inferences,
and draw
conclusions from
numbers.
Scale- is a set of numbers (or other symbols)
whose properties model empirical properties
of the objects to which the numbers are
assigned.
○ Continuous Scale (e.g. blood pressure,
measurement to install venetian blinds)
○ Discrete Scale (e.g. male/female subjects)
Scales of Measurement
4 Scales of
Measurement (NOIR)
Nominal Scales
Ordinal Scales
Interval Scales
Ratio Scales
1
2
3
4
Nominal Scales
● could simply be called “labels.”
Ordinal Scales
• sounds like “order”
Interval Scales
• “space in between”
• don’t have “true zero”
Ratio Scales
• order, the exact value between units, AND have an absolute
zero.
• Example: reaction time, and individual scores such as
"number of items correctly recalled" or "number of errors".
Frequency Distribution
Simple frequency distribution
Grouped frequency distribution
How to illustrate Frequency Distribution
graphically:
3 Kinds of Graphs:
Histogram Bar graph Frequency polygon
Measures of Central
Tendency
Central Tendency- typical or average score of
group of scores
• Mean- “arithmetic mean” or “average”
• Median- the “middle” value
• Mode- the number that is repeated more
often than any other.
Mean
• denoted by the symbol X
̅
• is equal to the sum of the observations divided by
the number of observations.
For the raw scores
For a group frequency distribution
Median
• Middle score
• for odd= middle element
• for even= add two middle elements and divide by 2
Given: 66, 65, 61, 59, 53, 52, 41, 36, 35, 32
Step 1:
66, 65, 61, 59, 53, 52, 41, 36, 35, 32
Step 2:
105/2= 52.5
Mode
• The most frequently occurring score in a distribution of
scores.
Given: 66, 65, 61, 59, 53, 52, 41, 36, 35, 66, 32
Mode: 66
Measures of Variability
The Statistics that describe the amount of variation in
a distribution
• Range
• Interquartile Range
• Semi-interquartile Range
• Average Deviation
• Standard Deviation
• Variance
Range
• is equal to the difference between the highest and the
lowest scores.
Given the following sorted data, find the range.
12, 15, 19, 24, 25, 26, 30, 35, 38
R= HV-LV
R= 38-12
R= 26
The Interquartile and Semi-interquartile Ranges
• used to measure how spread out the data points in a
set are from the mean of the data set.
• IQR= Q3 - Q1.
• Semi-interquartile range= IQR/2
Interquartile range: 77-64= 13; Semi-interquartile range: 13/2= 6.5
Average Deviation
• Denoted as AD
• Formula:
• provides a solid foundation for understanding
the conceptual basis of another, more widely
used measure: the standard deviation.
Standard Deviation
• a measure of validity equal to the square root of
the variance
• Formula:
Variance
• Is the square of the standard deviation.
• Note: The larger the variance, the greater the
variability or the distance of scores from the
mean. The smaller the variance, the lesser the
variability.
Skewness
• an indication of how the measurements in a distribution
are distributed.
Kurtosis
• the steepness of a distribution in its center.
Standard Scores
● a raw score that has been converted from
one scale to another scale.
○ z Score
○ T Score
○ Stanines
Standard Scores
Why convert raw
scores to
standard scores?
Assumptions of
Psychological
Testing and
Assessment
7
Psychological Traits and States Exist
Psychological Traits and States Can Be Quantified
and Measured
Test-Related Behavior Predicts Non-Test-Related
Behavior
Tests and Other Measurement Techniques Have
Strengths and Weaknesses
Various Sources of Error Are Part of the Assessment
Process
Testing and Assessment Can Be Conducted in a Fair
and Unbiased Manner
Testing and Assessment Benefit Society
Norms
are the test performance data
of a particular group of
testtakers that are designed
for use as a reference when
evaluating or interpreting
individual test scores.
01 raw data converted to
percentage form
Percentile Norms
Types of Norms
02 Age-equivalent scores
Age Norms
03 population’s national
representative
04 provides stability to test
scores
National Anchor Norms
National Norms
05 average test performance
Grade Norms
06 e.g., social economics,
race
Subgroup Norms
07 provide
normative
information
Local
Norms
Norm-Referenced versus
Criterion-Referenced Evaluation
● Norm-Referenced Testing and Assessment-
evaluating an individual test taker’s score and
comparing it to scores of a group of test takes.
● Criterion-Referenced Testing and Assessment-
evaluation an individual’s score with reference to
a set standard.
01 test administering
Test Standardization
Sampling to Develop Norms
02 selecting a portion
Sampling
03 population representative
04 developing a sample
Stratified Sampling
Sample of a Population
05 test administering
Test Standardization
06 sample from population
close to hand
Convenience Sample
Correlation and Inference
● Inferences- deduced conclusions
● Correlation- an expression of the degree and
direction of correspondence between two
things.
○ Pearson r
○ Spearman’s rho
○ Regression
Do Do Not
Be aware of the cultural assumptions Take for granted that a test is based on
assumptions
Consider consulting with members of particular
cultural communities
Take for granted that members of all cultural
communities will automatically deem particular
techniques appropriate for use
Strive to incorporate assessment methods Take a “one-size-fits-all” view of assessment
Be knowledgeable about the many alternative
tests or measurement procedures
Select tests or other tools of assessment with
little or no regard for the extent to which such
tools are appropriate for use
Be aware of equivalence issues across cultures Assume that a test translated into another
language is automatically equivalent to the
original
Score, interpret, and analyze assessment data
in its cultural context
Score, interpret, and analyze assessment in a
cultural vacuum
Culturally Informed Assessment
Reliability
a degree to which scores from
a test are stable and results
are consistent.
01 stability, temporal
consistency
Test Re-Test
4 Ways to Assess Reliability
02 among independent
judges
Inner-rater
03 stability and
equivalence
04 homogeneity
Internal Consistency
Parallel/Alternate Forms
● Score on an ability test is presumed to
reflect not only the testtaker’s true score on
the ability being measured but also error
● has something to do with variance
01 Test Construction
Sources of Error Variance
02 Test Administration
03
04 Interpretation
Test Scoring
Conclusion...
• Validity is a term used in conjunction with the meaningfulness
of a test score or in other words what the test score truly
means.
• Validity, as applied to a test, is a judgment or estimate of how
well a test measures what it purports to measure in a particular
context.
• Characterizations of the validity of tests and test scores are
frequently phrased in terms such as “acceptable” or “weak.”
VALIDITY
• One way measurement specialists have traditionally
conceptualized validity in according to three categories:
1. Content Validity
2. Criterion-related Validity
3. Construct Validity
THE THREE CATEGORIES OF VALIDITY
• Content validity refers to the extent to which a measure
represents all facets of a given construct.
• Refers to the extent to which the items on a test are fairly
representative of the entire domain the test seeks to measure.
• For example, a depression scale may lack content validity if it only
assesses the affective dimension of depression but fails to take
into account the behavioral dimension.
CONTENT VALIDITY
• One method of measuring content validity, developed by C. H.
Lawshe, is essentially a method for gauging agreement among
raters or judges regarding how essential a particular item is.
• Rated as either:
⮚ Essential
⮚ Useful but not essential or
⮚ Not necessary
CONTENT VALIDITY RATIO
CVR=(Ne - N/2)/(N/2)
Where CVR = content validity ratio, ne = number of panelists
indicating “essential,” and N = total number of panelists.
CVR FORMULA:
• A criterion is defined broadly as a standard on which a
judgment or decision may be based.
• Criterion-related validity is a judgment of how adequately a
test score can be used to infer an individual’s most probable
standing on some measure of interest.
• The measure of interest being the criterion.
CRITERION-RELATED VALIDITY
• Two types of validity evidence are subsumed
under the heading of criterion-related validity:
⮚ Concurrent validity
⮚ Predictive validity
• Concurrent validity refers to the extent to which the results
of a measure correlate with the results of an established
measure of the same or a related underlying construct
assessed within a similar time frame.
• On the other hand, If the measure is correlated with a future
assessment, this is termed predictive validity. It is the extent
to which a score on a scale or test predicts scores on some
criterion measure.
• A construct is an informed, scientific idea developed or
hypothesized to describe or explain behavior.
• Construct validity is a judgment about the appropriateness
of inferences drawn from test scores regarding individual
standings on a variable called a construct.
• is the degree to which a test measures what it claims, or
purports, to be measuring.
CONSTRUCT VALIDITY
• Increasingly, construct validity has been viewed as the
unifying concept for all validity evidence (American
Educational Research Association et al., 1999).
• As we noted at the outset, all types of validity evidence,
including evidence from the content- and criterion-
related varieties of validity, come under the umbrella of
construct validity.
• In everyday language, we use the term utility to refer to the usefulness of
something or some process.
• In the language of psychometrics, utility means much the same thing; it
refers to how useful a test is.
• More specifically, it refers to the practical value of using a test to aid
in decision-making.
• We may define utility in the context of testing and assessment as the
usefulness or practical value of testing to improve efficiency.
UTILITY
• Moreover, a test utility's judgment can easily be affected by
the test’s psychometric soundness, costs, and Its benefits.
⮚ Psychometric soundness (pertaining to the reliability and
validity of a test).
⮚ Cost (pertains to how much budget is put into the
test/research).
⮚ Benefit (answers the question: would the overall time and
effort of testing be even worth it?).
• A utility analysis may be broadly defined as a family of
techniques that entail a cost–benefit analysis designed to
yield information relevant to a decision about the usefulness
and/or practical value of a tool of assessment.
• In a most general sense, a utility analysis may be
undertaken for the purpose of evaluating whether the
benefits of using a test outweigh the costs.
UTILITY ANALYSIS
• The term cut/ cut-off score refers to the lowest
possible score on an exam, standardized test, high-
stakes test, or other form of assessment that a taker
must earn to either “pass” or be considered “proficient.”
• Here are some examples of methods for getting cut
scores:
METHODS FOR SETTING CUT SCORES
• Devised by William Angoff (1971) ,it is a kind of study that
test developers use to determine the passing percentage
(cutscore) for a test.
• This method for setting fixed cut scores can be applied to
personnel selection tasks as well as to questions regarding
the presence or absence of a particular trait, attribute, or
ability.
THE ANGOFF METHOD
• Also referred to as the method of contrasting groups, the
known groups method entails collection of data on the
predictor of interest from groups known to possess, and not
to possess, a trait, attribute, or ability of interest. Based on
an analysis of this data, a cut score is set on the test that
best discriminates the two groups’ test performance.
THE KNOWN GROUPS METHOD
• In this theory, cut scores are typically set based on test
takers' performance across all the items on the test; some
portion of the total number of items on the test must be
scored “correct” in order for the test taker to “pass” the
test.
• Whereas classical test theories focus on the test as a
whole, item response theory shifts its focus to the
individual items (questions) themselves.
IRT-BASED METHODS
• “All tests are not created equal.” The creation of a good test
is not a matter of chance.
• It is the product of the thoughtful and sound application of
established principles of test construction.
• Here, in making a good test, has a five step process.
TEST DEVELOPMENT
1. Test Conceptualization (come up with an idea/ a test idea).
2. Test Construction (draft up a plan of what builds and fashions the test).
3. Test Tryout (try out the test on sample takers/ participants).
4. Item Analysis (analyze on which items or areas of the test that needs
revising or changing. This includes eliminating the irrelevant parts).
5. Test Revision (the test is perfected by the revision of the second draft).
THE FIVE STAGES IN DEVELOPING A
TEST:

More Related Content

Similar to Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx

eeMba ii rm unit-3.1 measurement & scaling a
eeMba ii rm unit-3.1 measurement & scaling aeeMba ii rm unit-3.1 measurement & scaling a
eeMba ii rm unit-3.1 measurement & scaling aRai University
 
Measurement in social science research
Measurement in social science research Measurement in social science research
Measurement in social science research Yagnesh sondarva
 
Chapter 8: Measurement and Sampling
Chapter 8: Measurement and SamplingChapter 8: Measurement and Sampling
Chapter 8: Measurement and SamplingDemelashAsege
 
Research methodology Chapter 6
Research methodology Chapter 6Research methodology Chapter 6
Research methodology Chapter 6Pulchowk Campus
 
Scales of Measurements.pptx
Scales of Measurements.pptxScales of Measurements.pptx
Scales of Measurements.pptxrajalakshmi5921
 
Concept of Measurements in Business Research
Concept of Measurements in Business ResearchConcept of Measurements in Business Research
Concept of Measurements in Business ResearchCS PRADHAN
 
Questionnaire and Instrument validity
Questionnaire and Instrument validityQuestionnaire and Instrument validity
Questionnaire and Instrument validitymdanaee
 
Measurement and scaling
Measurement and scalingMeasurement and scaling
Measurement and scalingBalaji P
 
Formulating a Hypothesis
Formulating a HypothesisFormulating a Hypothesis
Formulating a Hypothesisbjkim0228
 
fundamentals of data science and analytics on descriptive analysis.pptx
fundamentals of data science and analytics on descriptive analysis.pptxfundamentals of data science and analytics on descriptive analysis.pptx
fundamentals of data science and analytics on descriptive analysis.pptxkumaragurusv
 
IBR 5.pptx
IBR 5.pptxIBR 5.pptx
IBR 5.pptxKwekuJnr
 
Meaning and Methods of Estimating Reliability of Test.pptx
Meaning and Methods of Estimating Reliability of Test.pptxMeaning and Methods of Estimating Reliability of Test.pptx
Meaning and Methods of Estimating Reliability of Test.pptxsarat68
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate AnalysisSoumya Sahoo
 
differencial scale SE.pptx
differencial scale SE.pptxdifferencial scale SE.pptx
differencial scale SE.pptxSobhaEkka
 
Epidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notesEpidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notesCharles Ntwale
 
Learning assessment-presentation
Learning assessment-presentationLearning assessment-presentation
Learning assessment-presentationjerichollera2
 
Metrics in C.Epidemiology Unit 1-5.pdf
Metrics in C.Epidemiology Unit 1-5.pdfMetrics in C.Epidemiology Unit 1-5.pdf
Metrics in C.Epidemiology Unit 1-5.pdftila12nega
 

Similar to Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx (20)

eeMba ii rm unit-3.1 measurement & scaling a
eeMba ii rm unit-3.1 measurement & scaling aeeMba ii rm unit-3.1 measurement & scaling a
eeMba ii rm unit-3.1 measurement & scaling a
 
Measurement in social science research
Measurement in social science research Measurement in social science research
Measurement in social science research
 
Chapter 8: Measurement and Sampling
Chapter 8: Measurement and SamplingChapter 8: Measurement and Sampling
Chapter 8: Measurement and Sampling
 
Research methodology Chapter 6
Research methodology Chapter 6Research methodology Chapter 6
Research methodology Chapter 6
 
Quantitative research
Quantitative researchQuantitative research
Quantitative research
 
Scales of Measurements.pptx
Scales of Measurements.pptxScales of Measurements.pptx
Scales of Measurements.pptx
 
Concept of Measurements in Business Research
Concept of Measurements in Business ResearchConcept of Measurements in Business Research
Concept of Measurements in Business Research
 
Questionnaire and Instrument validity
Questionnaire and Instrument validityQuestionnaire and Instrument validity
Questionnaire and Instrument validity
 
Measurement and scaling
Measurement and scalingMeasurement and scaling
Measurement and scaling
 
Formulating a Hypothesis
Formulating a HypothesisFormulating a Hypothesis
Formulating a Hypothesis
 
Measurement and evaluation
Measurement and evaluationMeasurement and evaluation
Measurement and evaluation
 
Test appraisal
Test appraisalTest appraisal
Test appraisal
 
fundamentals of data science and analytics on descriptive analysis.pptx
fundamentals of data science and analytics on descriptive analysis.pptxfundamentals of data science and analytics on descriptive analysis.pptx
fundamentals of data science and analytics on descriptive analysis.pptx
 
IBR 5.pptx
IBR 5.pptxIBR 5.pptx
IBR 5.pptx
 
Meaning and Methods of Estimating Reliability of Test.pptx
Meaning and Methods of Estimating Reliability of Test.pptxMeaning and Methods of Estimating Reliability of Test.pptx
Meaning and Methods of Estimating Reliability of Test.pptx
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate Analysis
 
differencial scale SE.pptx
differencial scale SE.pptxdifferencial scale SE.pptx
differencial scale SE.pptx
 
Epidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notesEpidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notes
 
Learning assessment-presentation
Learning assessment-presentationLearning assessment-presentation
Learning assessment-presentation
 
Metrics in C.Epidemiology Unit 1-5.pdf
Metrics in C.Epidemiology Unit 1-5.pdfMetrics in C.Epidemiology Unit 1-5.pdf
Metrics in C.Epidemiology Unit 1-5.pdf
 

Recently uploaded

Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 

Recently uploaded (20)

Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 

Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx

  • 1.
  • 2. For most people, test scores are an important fact of life. But what makes those numbers so meaningful?
  • 3. What are Test Scores? ● expressed as numbers ● used to describe, make inferences, and draw conclusions from numbers.
  • 4. Scale- is a set of numbers (or other symbols) whose properties model empirical properties of the objects to which the numbers are assigned. ○ Continuous Scale (e.g. blood pressure, measurement to install venetian blinds) ○ Discrete Scale (e.g. male/female subjects) Scales of Measurement
  • 5. 4 Scales of Measurement (NOIR) Nominal Scales Ordinal Scales Interval Scales Ratio Scales 1 2 3 4
  • 6. Nominal Scales ● could simply be called “labels.”
  • 7. Ordinal Scales • sounds like “order”
  • 8. Interval Scales • “space in between” • don’t have “true zero”
  • 9. Ratio Scales • order, the exact value between units, AND have an absolute zero. • Example: reaction time, and individual scores such as "number of items correctly recalled" or "number of errors".
  • 10.
  • 11. Frequency Distribution Simple frequency distribution Grouped frequency distribution
  • 12. How to illustrate Frequency Distribution graphically: 3 Kinds of Graphs: Histogram Bar graph Frequency polygon
  • 13. Measures of Central Tendency Central Tendency- typical or average score of group of scores • Mean- “arithmetic mean” or “average” • Median- the “middle” value • Mode- the number that is repeated more often than any other.
  • 14. Mean • denoted by the symbol X ̅ • is equal to the sum of the observations divided by the number of observations. For the raw scores For a group frequency distribution
  • 15. Median • Middle score • for odd= middle element • for even= add two middle elements and divide by 2 Given: 66, 65, 61, 59, 53, 52, 41, 36, 35, 32 Step 1: 66, 65, 61, 59, 53, 52, 41, 36, 35, 32 Step 2: 105/2= 52.5
  • 16. Mode • The most frequently occurring score in a distribution of scores. Given: 66, 65, 61, 59, 53, 52, 41, 36, 35, 66, 32 Mode: 66
  • 17. Measures of Variability The Statistics that describe the amount of variation in a distribution • Range • Interquartile Range • Semi-interquartile Range • Average Deviation • Standard Deviation • Variance
  • 18. Range • is equal to the difference between the highest and the lowest scores. Given the following sorted data, find the range. 12, 15, 19, 24, 25, 26, 30, 35, 38 R= HV-LV R= 38-12 R= 26
  • 19. The Interquartile and Semi-interquartile Ranges • used to measure how spread out the data points in a set are from the mean of the data set. • IQR= Q3 - Q1. • Semi-interquartile range= IQR/2 Interquartile range: 77-64= 13; Semi-interquartile range: 13/2= 6.5
  • 20. Average Deviation • Denoted as AD • Formula: • provides a solid foundation for understanding the conceptual basis of another, more widely used measure: the standard deviation.
  • 21. Standard Deviation • a measure of validity equal to the square root of the variance • Formula:
  • 22. Variance • Is the square of the standard deviation. • Note: The larger the variance, the greater the variability or the distance of scores from the mean. The smaller the variance, the lesser the variability.
  • 23. Skewness • an indication of how the measurements in a distribution are distributed.
  • 24. Kurtosis • the steepness of a distribution in its center.
  • 25. Standard Scores ● a raw score that has been converted from one scale to another scale. ○ z Score ○ T Score ○ Stanines
  • 26. Standard Scores Why convert raw scores to standard scores?
  • 28. Psychological Traits and States Exist Psychological Traits and States Can Be Quantified and Measured Test-Related Behavior Predicts Non-Test-Related Behavior Tests and Other Measurement Techniques Have Strengths and Weaknesses
  • 29. Various Sources of Error Are Part of the Assessment Process Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner Testing and Assessment Benefit Society
  • 30. Norms are the test performance data of a particular group of testtakers that are designed for use as a reference when evaluating or interpreting individual test scores.
  • 31. 01 raw data converted to percentage form Percentile Norms Types of Norms 02 Age-equivalent scores Age Norms 03 population’s national representative 04 provides stability to test scores National Anchor Norms National Norms 05 average test performance Grade Norms 06 e.g., social economics, race Subgroup Norms 07 provide normative information Local Norms
  • 32. Norm-Referenced versus Criterion-Referenced Evaluation ● Norm-Referenced Testing and Assessment- evaluating an individual test taker’s score and comparing it to scores of a group of test takes. ● Criterion-Referenced Testing and Assessment- evaluation an individual’s score with reference to a set standard.
  • 33. 01 test administering Test Standardization Sampling to Develop Norms 02 selecting a portion Sampling 03 population representative 04 developing a sample Stratified Sampling Sample of a Population 05 test administering Test Standardization 06 sample from population close to hand Convenience Sample
  • 34. Correlation and Inference ● Inferences- deduced conclusions ● Correlation- an expression of the degree and direction of correspondence between two things. ○ Pearson r ○ Spearman’s rho ○ Regression
  • 35. Do Do Not Be aware of the cultural assumptions Take for granted that a test is based on assumptions Consider consulting with members of particular cultural communities Take for granted that members of all cultural communities will automatically deem particular techniques appropriate for use Strive to incorporate assessment methods Take a “one-size-fits-all” view of assessment Be knowledgeable about the many alternative tests or measurement procedures Select tests or other tools of assessment with little or no regard for the extent to which such tools are appropriate for use Be aware of equivalence issues across cultures Assume that a test translated into another language is automatically equivalent to the original Score, interpret, and analyze assessment data in its cultural context Score, interpret, and analyze assessment in a cultural vacuum Culturally Informed Assessment
  • 36. Reliability a degree to which scores from a test are stable and results are consistent.
  • 37. 01 stability, temporal consistency Test Re-Test 4 Ways to Assess Reliability 02 among independent judges Inner-rater 03 stability and equivalence 04 homogeneity Internal Consistency Parallel/Alternate Forms
  • 38. ● Score on an ability test is presumed to reflect not only the testtaker’s true score on the ability being measured but also error ● has something to do with variance
  • 39. 01 Test Construction Sources of Error Variance 02 Test Administration 03 04 Interpretation Test Scoring
  • 41.
  • 42. • Validity is a term used in conjunction with the meaningfulness of a test score or in other words what the test score truly means. • Validity, as applied to a test, is a judgment or estimate of how well a test measures what it purports to measure in a particular context. • Characterizations of the validity of tests and test scores are frequently phrased in terms such as “acceptable” or “weak.” VALIDITY
  • 43. • One way measurement specialists have traditionally conceptualized validity in according to three categories: 1. Content Validity 2. Criterion-related Validity 3. Construct Validity THE THREE CATEGORIES OF VALIDITY
  • 44. • Content validity refers to the extent to which a measure represents all facets of a given construct. • Refers to the extent to which the items on a test are fairly representative of the entire domain the test seeks to measure. • For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. CONTENT VALIDITY
  • 45. • One method of measuring content validity, developed by C. H. Lawshe, is essentially a method for gauging agreement among raters or judges regarding how essential a particular item is. • Rated as either: ⮚ Essential ⮚ Useful but not essential or ⮚ Not necessary CONTENT VALIDITY RATIO
  • 46. CVR=(Ne - N/2)/(N/2) Where CVR = content validity ratio, ne = number of panelists indicating “essential,” and N = total number of panelists. CVR FORMULA:
  • 47. • A criterion is defined broadly as a standard on which a judgment or decision may be based. • Criterion-related validity is a judgment of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest. • The measure of interest being the criterion. CRITERION-RELATED VALIDITY
  • 48. • Two types of validity evidence are subsumed under the heading of criterion-related validity: ⮚ Concurrent validity ⮚ Predictive validity
  • 49. • Concurrent validity refers to the extent to which the results of a measure correlate with the results of an established measure of the same or a related underlying construct assessed within a similar time frame. • On the other hand, If the measure is correlated with a future assessment, this is termed predictive validity. It is the extent to which a score on a scale or test predicts scores on some criterion measure.
  • 50. • A construct is an informed, scientific idea developed or hypothesized to describe or explain behavior. • Construct validity is a judgment about the appropriateness of inferences drawn from test scores regarding individual standings on a variable called a construct. • is the degree to which a test measures what it claims, or purports, to be measuring. CONSTRUCT VALIDITY
  • 51. • Increasingly, construct validity has been viewed as the unifying concept for all validity evidence (American Educational Research Association et al., 1999). • As we noted at the outset, all types of validity evidence, including evidence from the content- and criterion- related varieties of validity, come under the umbrella of construct validity.
  • 52.
  • 53. • In everyday language, we use the term utility to refer to the usefulness of something or some process. • In the language of psychometrics, utility means much the same thing; it refers to how useful a test is. • More specifically, it refers to the practical value of using a test to aid in decision-making. • We may define utility in the context of testing and assessment as the usefulness or practical value of testing to improve efficiency. UTILITY
  • 54. • Moreover, a test utility's judgment can easily be affected by the test’s psychometric soundness, costs, and Its benefits. ⮚ Psychometric soundness (pertaining to the reliability and validity of a test). ⮚ Cost (pertains to how much budget is put into the test/research). ⮚ Benefit (answers the question: would the overall time and effort of testing be even worth it?).
  • 55. • A utility analysis may be broadly defined as a family of techniques that entail a cost–benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment. • In a most general sense, a utility analysis may be undertaken for the purpose of evaluating whether the benefits of using a test outweigh the costs. UTILITY ANALYSIS
  • 56. • The term cut/ cut-off score refers to the lowest possible score on an exam, standardized test, high- stakes test, or other form of assessment that a taker must earn to either “pass” or be considered “proficient.” • Here are some examples of methods for getting cut scores: METHODS FOR SETTING CUT SCORES
  • 57. • Devised by William Angoff (1971) ,it is a kind of study that test developers use to determine the passing percentage (cutscore) for a test. • This method for setting fixed cut scores can be applied to personnel selection tasks as well as to questions regarding the presence or absence of a particular trait, attribute, or ability. THE ANGOFF METHOD
  • 58. • Also referred to as the method of contrasting groups, the known groups method entails collection of data on the predictor of interest from groups known to possess, and not to possess, a trait, attribute, or ability of interest. Based on an analysis of this data, a cut score is set on the test that best discriminates the two groups’ test performance. THE KNOWN GROUPS METHOD
  • 59. • In this theory, cut scores are typically set based on test takers' performance across all the items on the test; some portion of the total number of items on the test must be scored “correct” in order for the test taker to “pass” the test. • Whereas classical test theories focus on the test as a whole, item response theory shifts its focus to the individual items (questions) themselves. IRT-BASED METHODS
  • 60.
  • 61. • “All tests are not created equal.” The creation of a good test is not a matter of chance. • It is the product of the thoughtful and sound application of established principles of test construction. • Here, in making a good test, has a five step process. TEST DEVELOPMENT
  • 62. 1. Test Conceptualization (come up with an idea/ a test idea). 2. Test Construction (draft up a plan of what builds and fashions the test). 3. Test Tryout (try out the test on sample takers/ participants). 4. Item Analysis (analyze on which items or areas of the test that needs revising or changing. This includes eliminating the irrelevant parts). 5. Test Revision (the test is perfected by the revision of the second draft). THE FIVE STAGES IN DEVELOPING A TEST:

Editor's Notes

  1. A simple way to categorize different types of variables. Useful in how to choose the appropriate Inferential Statistic Tool.
  2. Ordinal,
  3. https://www.ncbi.nlm.nih.gov/books/NBK305233/