1. Validity! We need to
find out if our
research is sound.
Do our tests
measure what they
claim to measure?
2. Are techniques used to collect data in tests,
questionnaires, interviews and observations measuring
what is claimed? For example was the Strange Situation
really measuring attachment style?
3. We need to be
able to measure
or observe
something time
after time and
produce the
same or similar
results
4. I want to measure intelligence. If the same
person sits the test on several occasions and
the results change each time, then that test
lacks reliability
5. The test also arguably lacks validity because the
scores are meaningless
6. If I test my participants again several months
later and their scores remains consistent, I can
say the test is reliable, but it might still lack
validity.
7. Is an A level in Psychology a valid and reliable
assessment of your performance in Psychology.
8. This measures consistency from one occasion to another –
the same result should be found on different days, in different
labs , observations or interviews, by different researchers
I exposed these
teenage brain
cells to 1000
PowerPoint
slides last
Monday and
they’re all dead
I thought that
was a fluke but
they seem to
be shrivelling
after only five
minutes!
9. Participants take the same test on different occasions – a high correlation between
test scores indicates the test has good external reliability .
Timing is crucial. Why?
January June
I hope that’s
the right
answer this
time
10. This refers to the consistency of a researcher’s behaviour.
A researcher should produce similar test results, or make similar observations or
carry out interviews in the same way on more than one occasion.
Thanks for taking
part today. Any
problems and I’ll
be right over. Take
your time.
Right. Let’s get on.
Fast as you can.
How much longer
before I can get in
the pub and relax
my facial muscles?
11. In observational
studies this is known
as inter-observer
reliability – observers
have to agree on what
they see and carry out
the same procedure
Consistency between
different researchers
working on the some
study is very important
for reliability
12. 1. Increase reliability by standardising instructions
2. Carry out a pilot study to improve procedures and
materials
3. You will be thoroughly trained in the use
of materials and procedures prior to our
study taking place
13. This measures the extent to which a test or procedure is
consistent within itself, i.e., questionnaire items or questions
in an interview should all be measuring the same thing
Do you like to keep to deadlines?
Do you get impatient driving?
Do you like cheese?
Do you like doing several tasks at once?
Do you like chocolate?
Do you get easily irritated?
Are you competitive?
This interviewer seems
a little confused about
Type A personality traits
14. Odds/Evens Top/Bottom
Compares a participant’s performance on two halves of a test or questionnaire –
there should be a close correlation between scores on both halves of the test.
Questions in both halves should be of equal quality for good internal reliability.
15. Would you see this as bullying or
horseplay in the playground?
You would see
this from your
own subjective
viewpoint –
we’re biased by
experience and
expectation
Observers must
agree about what
they are observing –
they need to use
standardised
behavioural
categories
16. Measuring Reliability
Match the method of estimating reliability
to the description
Test-Retest
reliability
If the measure depends
upon interpretation of
behaviour, we can
compare the results
from two or more
raters.
If the results in the two
halves are similar, we can
assume the test is reliable
Split Half
Reliability
Splitting a test into two
halves, and comparing
the scores in both
halves
If the results on the two
tests are similar, we can
assume the test is reliable
Inter-Rater
reliability
The measure is
administered to the
same group of people
twice
If there is high agreement
between the raters, the
measure is reliable
17. The tool is measuring what it is
intending to measure
=
=
The findings can be generalized
beyond the context of the
research situation
18. Does our
measuring
tool appear
to be doing
what it
should?
Face
validity:
One or more
judges assess
whether the
test seems
appropriate
and suggest
changes if
necessary
19. Does the content of a
test cover everything in
the area of interest?
Content validity:
More rigorous –
experts in the field
systematically examine
the tool’s components
and compare them with
set standards
They have to agree the
content is appropriate
21. Population Validity
Can we generalise
findings from our
research participants
to other population
groups?
22. Can we apply our findings to
other contexts and situations
outside of the research setting?
Ecological Validity
23. Improving external validity
• Sample must be representative of target
population and be unbiased…..
• Research situation must reflect real life
situation e.g. debate over Milgram….Strange
Situation