Validity in Psychological Testing

Presentedby: Sandra Arenillo
MP-CP

Analogy of Reliability and
Validity

Validity
 Measure what it claims to measure.
Ginagawa ba ng test ang dapat nyang ginagawa?

Questions to Ask to differentiate
Reliability and Validity
In RELIABILITY
 Is the test giving CONSISTENT results?
In VALIDITY
 Is the test fulfilling its PURPOSE?

VALIDITY is more than
 It measure what it supposed to measure.
ALSO,
 It verify the test capability to predict thinking or
behavior
 It also verify the test if it will be taken seriously
by test takers.

3 Categories of Validating Procedures
1. Construct
Identification
Procedure
1.1. Age Differentiation
1.2. Convergent Validity
1.3. Discriminant Validity
1.4. Factor Analysis
1.5. Contrasted Groups
2. Criterion Prediction
Procedures
2.1. Concurrent Validity
2.2. Predictive Validity
3. Content
Description
Procedures
3.1. Content Validity
3.2. Face Validity

Let’s define it one by one:
1. Construct
Identification
Procedure
Is the test
measuring what
it claims to
measure?

Construct Identification Procedure
KEY TERM: “CONSTRUCT”
It pertains to the Variable measured by the test
Intelligence
Personality
Religiosity

CASE 1.1:
You are a school psychologist and you developed a
psychological test that measures intelligence.
According to your reading of review of literature
before you constructed the test, one quality of
intelligence is that it increases as one advances in age.
Now that you are done constructing the test, what can
you do to check if the intelligence you constructed is
really measuring intelligence?

General Steps in Case 1.1
1. Administer test to different grade levels
(i.e. Grades 1, 2, 3) WHY?
2. Get results
See Outcome if test is valid

VALID OR INVALID?
Grade 1 Grade 2 Grade 3
IQ
IQ

VALID OR INVALID
Grade 1 Grade 2 Grade 3
IQ
IQ

Technique
Does construct CHANGE WITH
AGE as PREDICTED?

Important!
This is only applicable when Construct is AGE
RELATED (How would you know?)
Through REVLIT

Case 1.2
You have constructed a psychological test that measures
optimism. The following are the key details about
optimism you have read from the literature before you
constructed the test.
1) Optimism has nothing to do with age
2) Optimism is strongly correlated to happiness
What would you do to prove your test is indeed really
measuring optimism.

General Steps for Case 1.2
1. Find established test STRONGLY RELATED to your
test construct (I.e. HAPPINESS)
2. Find a population
3. Administer the two test to the same population
4. Note scores for each individual (How many?)
5. Correlate Optimism and happiness

Let’s review correlation
Coefficient Number Interpretation
.00-0.19 Very Weak
0.20-0.39 Weak
0.40-0.59 Moderate
0.60-0.79 Strong
0.80-1.0 Very Strong

Possible Outcomes Case 1.2
First, Correlation coefficient is direct and strong.
A. +0.92
B. -0.92
C. +0.30
Second, Correlation coefficient is inverse and strong
A. +0.92
B. -0.92
C. +0.30
Third, If correlation is weak
A. +0.92
B. -0.92
C. +0.30
No pattern

1.2.1 Correlate test to another test with related
construct
 Optimism   Happiness
 Neuroticism   General Anxiety
HOW would you know?
Newly Constructed Test   Established Test
RELATED CONSTRUCT

1.2.2. Another approach (more recommended)
ESTABLISHED exactly the SAME construct.
Newly Constructed Happiness Test will be correlated to
Same Established Test Happiness Test
Your Newly Constructed Test  
Oxford Happiness Scale

Why make another test if there are
test available already in the market?
A. Improvement
If the test is for adult, you can make a test for child.
Or if the factor is found unrelated in the previous
studies you may revise the existing test and make
one.
B. Additional
 If there is a new factor that emerges as a result of
research

Why make another test if there are
test available already in the market?
C. Adaptation
If you want to adapt a foreign test and make it
indigenous to match your population.
Eg. Standford-Binet Intelligence Test.
(Adaptation by Lewis Termann includes Language
Translation and Adding item appropriate for your
population)
For Projective Personality the HUTT-Adaptation of
Bender Gestalt Test

Why make another test if there are test
available already in the market?
D. To provide a test that is more cheaper than in the market.
BRANDED MEDICINE   GENERIC DRUGS

3. Discriminant Validity
 Correlating a test to unrelated construct
Established Test   Newly Constructed Test
Eg. Wechsler Intelligence Test and your Constructed
Optimism Scale

General Steps 1.3
1. Find another ESTABLISHED unrelated test.
OPTIMISM?  Intelligence
2. Administer both test to same population
3. Correlate!

Possible outcomes of
Discriminant Validity
First, if correlation is positive and strong
Constructed TEST is INVALID
(Why?) They were directly and strongly correlated. You are
using discriminant validity
Second, if correlation is negative and strong
Constructed TEST is INVALID (Why?) They are strongly
correlated with inverse relationship
Third, Correlation is weak.
Constructed TEST is VALID

Case 1.4
Case: You constructed a test that claims to
measure level of depression.
The country you are in has lost all its copies of the
established depression test that you could
possibly use to establish the validity of your test.
No other psychological test (related or unrelated to
your test) can be used to help you validate your
test. How can you prove that your test really
measures depression?

IV. Contrasted Groups
Making use of a group of subjects to validate a test’s
construct.
E.g. Depression Scale

General Steps:
1. Find highly depressed people, measure
depression and compare the scores to those
people who are not depressed and compare
Since there is no established test were you may use,
you cannot get a correlation score. In this
contrasted group, you only need to get the mean.

Case: You constructed a test that aims to
measure math ability.
Your REVLIT tells you that math ability contains
4 other dimensions that people possess D1,
D2, D3, and D4.
You wonder if the test you constructed also
contains these 4 reported in REVLIT.
What would you do to confirm the existence of
these 4 dimensions?

Analogy: Bb. Pilipinas
Beauty (3 Factors)
1. Physical Appeal
2. Skills/Talent
3. Intelligence

V. FACTOR ANALYSIS
i. it counts the NUMBER of Dimension
(“Factor”) the test has.
ii. Which item under each factor?
Same with Psychological test= IQ Test ( How
many factors?)

General Steps: (Factor Analysis)
1.Administer test
2. Factor Analysis
Possible Outcomes
How Factor Analysis Decides?
Correlating items to each other
High correlation-correlated factors

Sample Case:
REVLIT: 4 Factors
F.A: 4 Factors
VALID
REVLIT: 4 Factors
F.A: 2 Factors
Partially VALID

1. Construct
Identification
Procedure
2. Criterion
Prediction
Procedures

2. CRITERION PREDICTION
PROCEDURES
 Concurrent Validity
 Predictive Validity Can a test predict
future thinking and
behavior?

For Example:
Howard’s Risk of Suicide Test (Feature-High
Predictive Validity)
High Scorers are more likely to commit suicide.
Low Scorers are less prone to commit suicide

POSSIBLE USES OF A PREDICTING TEST
1. Prevention of Psychological
Disorders (Guidance Counselors)
-> who to counsel or undergo
 program before it happens.

POSSIBLE USES OF A PREDICTING TEST
 2. Knowing who to hire? (HR Practitioners)
Save money from hiring
wrong employees

HOW CAN I PREDICT?
Key Term: “CRITERION”
Numerical Expression of Observable
Indicators of a test construct.

Examples:
Test Construct: Intelligence
Ask: Numerical Expression?
Criterion: School GWA, IQ Score
Test Construct: Extraversion
Criterion: Number of event/party attended
Test Construct: Aggression
Criterion: Number of school offenses, fistfights.

CASE: Your principal bought an expensive
intelligence test to be used as a screening tool
for incoming freshman students. This test is
expensive because of its special ability: It can
predict students who will get high grades
three years after they are admitted in the
university.
Your principal thinks that it is a good test to be
used to help the school decide who will get
scholarships. As a school chief
psychometrician, what would you do to find
out whether this test can really predict
students who will get high college grades?

General Steps:
(Testing Predictive Validity of a Test)
1. Administer exam to incoming freshmen students.
2. Score
3. Select future appropriate criterion
Construct: Intelligence
Appropriate Criterion: GPA third year

General Steps:
(Testing Predictive Validity of a Test)
4. WAIT for the future criterion
5. CORRELATE the two scores
Entrance Exam Third Year CGPA
Subject 1 ? ?
Subject 2 ? ?
Subject 3 ? ?

Entrance Exam Test Score   Third Year GPA
Say +.93
(interpret)
Relationship: DIRECT and STRONG
Higher entrance exam, higher CGPA (third year)
Lower entrance exam, lower CGPA (third year)
It means the TEST CAN PREDICT

Say -.93
(interpret)
Relationship: INVERSE and STRONG
Higher entrance exam, lower CGPA (third year)
Lower entrance exam, higher CGPA (third year)
It means the TEST CAN PREDICT but this is
ridiculous.

-0.18
(Interpret)
High/Low entrance exam scores has nothing to do
with third year CGPA
It means the TEST CAN’T PREDICT

Predictive Validity
Test  Future Criterion
Ability of the Test to predict that a criterion will be
observed in the future

Disadvantage of Predictive
Validity

Case:
Your HR head asked you to validate your company’s job
efficiency test being used to hire rank and file
employees (I.e. secretaries, encoders, gift wrappers,
salesladies etc.,).
Specifically, your boss wants to find out if your company
test can predict good performance in the future of
those taking it.
Your boss would like to know the answer tomorrow
afternoon. What would you do to confirm if this test
can really predict job efficiency or not?

General Steps:
1. ADMINISTER, Score test (Job efficiency)
2. Select existing criterion (?)
 Supervisor’s rating (Last month)
3. Correlate test results and supervisor’s rating last
month.

Let’s Practice!
Say +.91
Relationship: DIRECT and STRONG
Say -.95
Relationship: INVERSE and STRONG
Say +0.19
Relationship: DIRECT and WEAK (No pattern)

2. CONCURRENT VALIDITY
TEST  Immediately available criterion
Ability of the test to predict an already existing
CRITERION (“PAST”)
It is faster than predictive.

The question now is that “Why many still choose
predictive validity if Concurrent is faster?
Predictive Validity Concurrent Validity
High in Accuracy/Certainty Faster result

ANALOGY:
NOSTRADAMUS MADAM AURING

The question now is that “Why many still choose
predictive validity if Concurrent is faster?
Concurrent Validity is used up until the
establishment of Predictive Validity

1. Construct
Identification
Procedure
2. Criterion
Prediction
Procedures
3. Content
Description
Procedures
3.2. Face Validity

Content Description Procedures
Simplest and Easiest way of Validation (Why?)
No Administration, No Statistics
What is the basis? Subjective Judgment

Content Description
Procedures
KEY TERM: PRESENTABILITY of the Test
Questions to Ask:
A. “Is the test looks like a test”?
B. Or will the test taker took seriously the test?
Now as psychometrician,
How would we know that the TEST is presentable?

There are two methods of
Content Description Procedure
3.2. Face Validity

3.1 Content validity
Examines APPROPRIATENESS of test items of a
psychological test (?)
Question to be answered:
Do the items belong to the test or not?

CONTENT VALIDITY
Are all the items in this test supposed to be in this
test? (“Fits in”)
How to know?
Example: Content Validity of the Fear Scale

General Steps in Content Validity
1. Review conceptual definition of fear
Remember there are two definition of concept in
thesis
a. Operational definition-how is the variable
measured in the thesis?
b. Conceptual Definition-how is the variable is
define in the REVLIT?

2. Compare each item to
conceptual definition
E.g. Conceptual definition: FEAR (from REVLIT)- an
emotion caused by real or imagined threat that can
potentially result to danger, humiliation and pain.
1. I feel afraid every time I think of myself undergoing a
strenuous physical activity
2. I feel “Cold” when I find my physical safety is
threatened (i.e. not knowing whether the food is
clean or not”)
3. I get extremely uneasy when I am not going to do
things well in front of many people.
4. I get excited when I look forward to meeting a friend I
want to spend time with.

Who can establish content
validity?
1. Authors of the Test (Problem)
2. Subject Matter Experts (SMEs)

Quantifying Content Validity
Ne=number of panelist who agrees
N= number of Panelist
Scores ranges from 0 (Low) -1 (High)
The panelist will either accept, reject or revise

3.2. Face Validity
PRESENTATION/PHYSICAL Appearance of the
psychological test.
Question answered: Is the test PRESENTABLE to the
test takers?

Why should we even care
about face validity?
Seriousness of taking test  good presentation
(everyday)
Let’s Translate PSYCHOMETRICS

Let’s translate psychometrics in
everyday situation
FOOD- What would you prefer to eat?

How can we know if the test has
face validity? (“presentable?)
i. the test looks like a test

How can we know if the test has face validity?
(“presentable?)
ii. Age appropriate (i.e. kids vs. adults)
a. Design
b. Language difficulty

How can we know if the test has face
validity? (“presentable?)
iii. Free from grammatical error
(BIG TURN OFF!)

Validity in Psychological Testing

Validity in Psychological Testing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Validity in Psychological Testing

Similar to Validity in Psychological Testing (20)

More from Sandra Arenillo

More from Sandra Arenillo (13)

Recently uploaded

Recently uploaded (20)

Validity in Psychological Testing