Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Psychometric Studies in the Development of an Inkjet Printer
1. Psychometric Studies in the
Development of the Kodak AiO
Inkjet Printer
David Lee
Inkjet Systems Division
Eastman Kodak Company
2. 2
The Kodak Value Proposition
… leading to three key vectors of differentiation
“Implementing Kodak’s Digital Imaging System Strategy”
IMAGE QUALITY
ONE BUTTON
SIMPLE & FAST
Real
Kodak
Pictures
R
E
A
LLY
E
A
S
Y
REAL
VALUE
Look, feel, durability, and fade-
resistance of Kodak pictures.
Affordable pictures
Simple, convenient photos and
text, with and without a PC
Kodak
Quality
3. 3
Agenda
Introduction / Overview
• What is psychometrics?
• Thresholds, Just-Noticeable Differences and the Psychometric Function
• Why do we need it?
• Class example
Experimental Considerations
• Unbiased
• Environmental considerations (i.e., Lighting)
• Issues & Challenges
– Number of samples
Technical Aspects with Examples
• Measurement/Numerical Scales
• Selected scaling methodologies and data analysis
– Direct versus indirect methods
– Hybrid DASC Method
Summary/Acknowledgements
4. 4
What is Psychometrics?
What is psychophysics/psychometrics? I’m not sure!
•The branch of psychology concerned with quantitative relations between
physical stimuli and their psychological effects
• Scientific approach to studying relationships between human perception
(psychology) and measured physical characteristics
human perception physical measurements
5. 5
“What is the value of the characteristic that is just visible or just detectable?”
“What is the minimum value of the characteristic that is seen as different from a
reference or standard?”
For example, if we present a stimulus varying in intensity, the observer
responds with ‘Yes’ or ‘No’ indicating they see/feel/observe the difference.
•Estimate the probability or likelihood of ‘successful’ outcome
• Think of shining a light in a dark room starting at a very low intensity
• The voltage is slowly increased until the observer notices a difference
• Some people will be more sensitive and notice a difference earlier while
others won’t see a difference until much more power
Thresholds and Just-Noticeable Differences (jnd)
Two Key Questions…
7. 7
Why do we need psychophysics?
Ties the Voice of the Customer (VOC) to…
Marketing:
•Benchmarking
• Where do we stand wrt to competitors?
•Program Goals
• What’s the ‘best’ level of _______
Product Design & Development:
•Design Trade-offs:
• Print quality versus print speed
• Tonescale (Color Gamut) versus ink laydown
•Troubleshooting
• How noticeable is an artifact?
•Monitor Development Progress
• Where we are with things?
• Has the perception of a characteristic changed over time?
9. 9
Remove the two (2) reference anchors from the envelope labeled “anchors” and
place the darker one to your left and the lighter one to your right. As marked
towards the bottoms, these anchors represent lightness levels of zero (0) and
100, respectively. Next, remove the seven (7) sample patches from the envelope
labeled “numerical rating samples” and place them face-down to your side. Your
task is to compare the lightness of each sample to the two anchors, one sample at
a time, and select a number between zero (0) and 100 that best represents it’s
lightness level. After you make each of your decisions, place the sample to the
side, face down, and record the lightness value you chose next to the matching
letter ID on the sample score-sheet. Any questions?
low anchor = 0 high anchor = 100
Experimental Instructions for Two-Anchor Numerical
Rating Lightness Experiment
test sample
(evaluated one at a time)
Ref.: samples J, E, L, S, C, Y, D
10. 10
Experimental Instructions for Two-Anchor Numerical Rating
Lightness Experiment (Continued)
Double Anchor Numerical Rating
Score Sheet
Sample ID Score
J
E
L
S
C
Y
D
11. 11
Discussion from Double-Anchor Numerical
Rating (Lightness) Experiment
Lay your samples out in front of you based on the lightness scores that you assigned to each sample.
Are your numerical scores generally consistent over the wide range of lightness?
Can you use your scores to reliably detect small differences between samples? Are all of your scores consistent with relative
lightnesses when you directly compare samples (e.g. samples ‘Y’ and ‘D’)?
Would you consider this type of methodology better for:
A) wide range scaling at lower resolutions?
B) high resolution of small differences between samples?
Do you believe that this type of method would be practical for relatively large numbers of samples (e.g. handling, difficulty of
evaluation, etc.)?
0 10 20 30 40 50 60 70 80 90 100
12. 12
Numerical Ratings: Advantages/Limitations
The task of numerical rating is relatively simple for observers
This method is compatible with physical or imaginary anchors.
•Double anchor of zero to 100
The numerical rating method is effective for lower accuracy/precision
characterizations over wide ranges of perception.
It is also compatible with larger numbers of samples.
The numerical rating method is generally not effective for detecting small
differences between samples. This is because of our inherent limitations in
estimating absolute magnitudes over wide ranges (Engeldrum).
Data are approximately linear (interval or ratio scales)
13. 13
Experimental Instructions for
Rank Order Lightness Experiment
Please remove the six (6) samples (B, O, R, H, A & M) from the envelope
marked “rank order samples”. Comparing all six (6) samples to each other,
your task is to arrange them in order of increasing lightness with the darkest
sample towards your left and the lightest towards your right. Once you are
satisfied with your rank ordering of lightness, record the order of your samples
on the rank order score-sheet that is provided and position your samples in that
order on the sheet. Any questions?
darkest lightest
15. 15
Discussion from Rank Order Lightness Experiment
Do you believe that this is an effective method for detecting small lightness differences
between samples?
Do you believe that this would be an effective method for scaling samples over wide
ranges?
Could this method be used with a large number of samples?
What do you believe are the strengths and weakness compared to the numerical rating
method?
What would you conclude about the relative lightness between two given samples where
50% of the observers ranked sample A lighter than B and 50% ranked sample B lighter
than A?
If 75% of observers ranked sample A lighter than B and 90% of observers ranked sample
B lighter than C, would you conclude that there is a larger lightness difference between A-
B or B-C?
CA B
A B
16. 16
Rank Order Method: Advantages/Limitations
The rank order “confusion” method is very effective for detecting small
differences between samples. This is based on our innate competencies in
making close comparative judgments. It is also effective for evaluating multi-
dimensional trade-offs such as overall IQ.
There are practical limits to numbers of samples (e.g. 8, 10, 15?)
Rank ordering tends to lose its effectiveness over relatively wide perceptual
ranges.
Rank order methods afford direct assess to “preference”, “acceptability”, and
“noticeability” data.
We cannot assume that raw rank order data are linear; they are ordinal…
We can leverage “observer confusion” to derive interval scales using
Thurstone’s “law of comparative judgments”
17. 17
Relative Lightnesses of Sample Patches
ID Relative Lightness RGB
Black 0 0
J 5 13
E 20 51
L 45 115
S 58 148
C 68 173
O 80 204
R 81 207
H 82 209
A 83 212
M 84 214
Y 84 214
D 85 217
B 94 240
White 100 255
18. 18
Experimental Considerations
•In addition to the issues of conducting any designed experiment you must be
aware of the following:
• Follow the same protocol/methodology for each observer
– Keep the test unbiased!
• Issues & Challenges
– Range and distribution of the characteristic
» Number of samples versus objective
– Image content
» 1st
party versus 3rd
party images
» Make sure the content spans the range of interest
– Number & type of observers
» Expert versus average users
– Sample preparation, handling and maintenance
» Sample borders
19. 19
Environmental Considerations
Presenting & viewing the samples
• Presentation Mode
– All-at-once versus one-at-a-time
– Viewing Distance
– Environmental factors
» Make sure the observer is comfortable
• Scene lighting
tungsten fluorescentdaylight
20. 20
Classification of Scale Types
Scale Principle/
Definition
Example Arithmetic
Operations
Statistical
Operations
Nominal Identity
Classes
Male vs.
Female
Counting χ2
Analysis
Binomial
Ordinal Order Order of
finish in a
race
Greater than
or less than
operations.
median, IRQ,
Nonparametric
Interval Ordering is
known wrt to
an attribute
Temp (°F)
Most
personality
measures.
Addition and
subtraction of
scale values.
Mean
ANOVA
Multivariate
Ratio There is a
rationale
zero point for
the scale.
Temp (°K)
Distance
Multiplication
and division
of scale
values.
Geometric
Mean
ANOVA
21. 21
Direct versus Indirect Scaling
Direct Methods
Judgments of the magnitude are made in direct scaling through directly
assigning numbers that correspond to the level of the stimuli
•Judgments are often made relative to a reference or anchor
Numeric estimation provides a very simple and easy method toward
constructing interval scale estimates
•Reliability of the estimate
Methods include
• Two-Anchor Grayscale Example
• Line or Ruler Based Systems
• Comrey Constant Sum Paired Comparison
• Magnitude Estimation
Observers are asked to assign a value to each stimulus.
Each stimulus is observed only once and one-at-a-time
•Tendency for a ‘learning curve’ effect
22. 22
Stevens’ Power Law
Where:
Ψ=subjective magnitude, one’s impression of of the stimulus magnitude
R=the actual, but often unknown, magnitude of the response
S=physical magnitude of the stimulus
b=characteristic growth exponent
k=constant of proportionality/scaling constant
b
kSR ==Ψ
)log()log(*)log( kSbR +=
23. 23
Exponents of the Power Law
Exponents are interpreted as a
‘signature’ characterizing the
impression of that specific
sensation via numeric estimation
• Reported correlations on the
order of 99%
Critics argued that solely relying on
numeric estimates made it
impossible to:
• Verify the Power Law
• Confirm the characteristic
exponent
• Prove superiority of magnitude
scaling techniques
24. 24
Cross Modality Matching
Assumptions
•IF the coefficients in the previous
table truly do represent the
characteristic exponent, then any of
these stimuli could be used in lieu of
numeric estimation
•Results support the characteristic
exponents shown in the earlier table
•Responses other than numbers can
be successfully utilized
25. 25
Here’s Where the Rubber Hits the Road…
Any two quantitative response measures with established characteristic
exponents can be used to judge sensory stimuli (i.e. Text Quality)
‘Validity’ is determined by comparing the theoretical to empirical ratios
between the two response measures
Yields ratio-scale estimates of the sensory stimuli suitable for statistical
analyses (i.e. ANOVA & Hypothesis Testing)
S ψ
R1
R2
)log()log(*)log( 111 kSbR +=
)log()log(*)log( 222 kSbR +=
26. 26
Text Quality Example
Finally…
Problem:
•Define the minimum acceptable text quality for draft mode printing
•Quantify the effect of ‘Factor A’ and ‘Factor B’ on perceived Text Quality
•Understand trade-offs
• Map out response as a function of input factors
Protocol:
•Test Samples were generated by simulation
•Factor settings data taken from competitive assessments
•Factors A & B were the only settings changed between simulations
•10x Scaling – Subjects were seated 5’ from samples for a 6-10” equivalent
viewing distance.
27. 27
General Procedure
Calibration Phase
•Subjects are given instruction and
practice in the use of two quantitative
response methods
•Allows for determining if subjects
judgments are proportional and to
what degree
• Can be used to estimate the
effects of bias and provide
correction
Scaling Phase:
•Repeat same procedure only now
with stimuli of interest
29. 29
Bias Correction / Results of the Calibration Phase
For numerical estimation of a specified line length
Similar results seen with line length estimation
Can be used to test if individuals are utilizing a proportional
judgments that track across the continuum
y = 0.8706x + 0.2605
R2
= 0.9956
0
0.5
1
1.5
2
2.5
0.0000 0.5000 1.0000 1.5000 2.0000 2.5000
Log(Actual Line Length)
Log(NumericEstimate)
Linear (Theoretical)
Linear (Calibration)
30. 30
Calculation of Ratio Scale Values
Under ideal conditions, NE1.0
=LP1.0
To compute magnitude scale values,
ψ=(NE1.0
*LP1.0
)0.5
To correct for bias,
ψ=(NE1/n
*LP1/n
)0.5
where n=exponent as determined from the calibration exercise
31. 31
Comparison of Text Quality Responses
After Bias Correction
Yellow lines represent minimum acceptable Text quality as determined by the
observers
Corrected Response Data from 20 Samples
y = 0.933x + 0.1421
R
2
= 0.8442
1.2
1.4
1.6
1.8
2.0
2.2
1.2 1.4 1.6 1.8 2.0 2.2
Corrected[Log(Numerical Estimate)]
Corrected[Log(ProducedLineLength)]
Theoretical Fit
Corrected
Response
Linear
(Corrected
Response)
32. 32
Variability Chart for the Composite Metric
The strong effect of Factor A is clearly seen
The variability of the replicates was very close
CompositeMetric
1.5
1.6
1.7
1.8
1.9
2.0
2.1
-1.00
-0.33
0.33
1.00
-1.00
-0.33
0.33
1.00
-1.00
-0.33
0.33
1.00
-1.00
-0.33
0.33
1.00
-1.00 -0.33 0.33 1.00
Factor B w ithin Factor A
34. 34
Comparison of Text Quality Responses
After Bias Correction
In addition to understanding the factors, the minimum acceptable levels for
Factors A & B are also understood!
Corrected Response Data from 20 Samples
y = 0.933x + 0.1421
R
2
= 0.8442
1.2
1.4
1.6
1.8
2.0
2.2
1.2 1.4 1.6 1.8 2.0 2.2
Corrected[Log(Numerical Estimate)]
Corrected[Log(ProducedLineLength)]
Theoretical Fit
Corrected
Response
Linear
(Corrected
Response)
35. 35
Direct Scaling Summary
Able to yield ratio scale
numbers for subsequent
analyses
Considerable effort
Magnitude estimation
required participant training
as well as calibration steps
Some participant fatigue
was evident
Some participants were not
following proportional
judgments
36. 36
Direct versus Indirect Scaling
Indirect Methods
Methods used to create a psychological scale in which information regarding an
individuals ability to detect or discriminate stimuli is used to infer the magnitude
of a sensation
Typically easier to conduct and communicate to participants
• Magnitude Estimation required training the participants as well as
conducting a calibration step
•Data analysis requires additional theory for interval scale generation
• Thurstone’s Law of Comparative Judgment
Methods include:
•Rank Ordering
•Successive Categories
•Category Scaling
•Paired Comparison
37. 37
Rank Ordering and Successive Categories
Rank Ordering
Quick and easy but loose information and non-uniformity of distance between
ranks
Successive Categories
Difficult to keep equal distances between categories
Information may be lost due to limited resolution of the categories
38. 38
Thurstone’s Law of Comparative Judgment
Each stimulus is perceived through a ‘discriminal process’ that has some value
An observer may vary his or her response for the same stimulus when viewed
multiple time
•Resulting in a ‘confusion’ level
By presenting the stimulus multiple times to either the same observer and/or
over different observers, a frequency distribution can be determined over the
psychological continuum
•Psychometric function
Thurstone assumed that the distribution followed a Gaussian distribution with a
specific mean and standard deviation, which he called scale value and
discriminal spread
•Thurstone describes 6 cases with various assumptions
•Case V assumes no correlation between the stimuli and constant variance
across the different stimuli
39. 39
Thurstone’s Law of Comparative Judgment
Case 5
14121086420
0.4
0.3
0.2
0.1
0.0
Psychological Response Cont inuum
FrequencyofResponse
3
7
11
Mean
Example of 3 Stimuli
40. 40
Example of Case 5 Applied to a Category Rating Experiment
Suppose we run a Categorical Rating experiment with the following 5 categories
Assume that we had 7 samples and 15 participants
very bad bad
niether
bad nor
good good very good
Category
very bad bad
niether
bad nor
good good very good
sample 1 2 1 8 3 1
sample 2 1 1 4 8 1
sample 3 6 1 6 1 1
sample 4 2 1 4 5 3
sample 5 6 2 4 1 2
sample 6 4 1 7 1 2
sample 7 1 1 1 3 9
Category
very bad bad
niether
bad nor
good good very good
sample 1 2 3 11 14 15
sample 2 1 2 6 14 15
sample 3 6 7 13 14 15
sample 4 2 3 7 12 15
sample 5 6 8 12 13 15
sample 6 4 5 12 13 15
sample 7 1 2 3 6 15
Convert scoring matrix to
a cumulative count matrix
41. 41
Example of Case 5 (Continued)
Convert to a cumulative proportion matrix
Convert to a z-score, calculate means and standard deviations with some additional scaling
Perform similar operations for the ratings (Very Bad … Very Good) to determine boundaries
Category
very bad bad
niether
bad nor
good good very good
sample 1 0.133 0.200 0.733 0.933 1.000
sample 2 0.067 0.133 0.400 0.933 1.000
sample 3 0.400 0.467 0.867 0.933 1.000
sample 4 0.133 0.200 0.467 0.800 1.000
sample 5 0.400 0.533 0.800 0.867 1.000
sample 6 0.267 0.333 0.800 0.867 1.000
sample 7 0.067 0.133 0.200 0.400 1.000
z-score matrix: mean std-dev
sample 1 -1.130 -0.840 0.610 1.480 0.030 1.230
sample 2 -1.560 -1.130 -0.250 1.480 -0.365 1.345
sample 3 -0.250 -0.100 1.080 1.480 0.553 0.858
sample 4 -1.130 -0.840 -0.100 0.840 -0.308 0.879
sample 5 -0.250 0.080 0.840 1.080 0.438 0.626
sample 6 -0.640 -0.440 0.840 1.080 0.210 0.875
sample 7 -1.560 -1.130 -0.840 -0.250 -0.945 0.550
42. 42
Case 5 (Continued)
Interval scales estimates of the psychological response are calculated
Boundary limits are also established for the Rating levels
This method is very useful for competitive assessments as well as tracking
progress over time
Relative Sample Scores
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
sample1
sample2
sample3
sample4
sample5
sample6
sample7
sample ID
relativescalevalue
Upper Category Boundaries
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
verybad
bad
nietherbadnorgood
good
category
upperboundary
43. 43
Indirect Scaling Summary
Rank ordering and successive categories are easy tests to administer
•Participants don't require special training
• Other than the basic instructions for performing the task at ahnd
•Provide limited information
Thurstone's Law of Comparative (Categorical) Judgment provides a means to
determine a psychological scale from caparison (category) judgments
• Requires assumptions about the distribution of both the psychological
response as well as the category boundaries
• Yields interval scale numbers
Paired comparisons (PC) can utilized when the number of stimuli is fairly low
and a fine level of distinction is necessary
•PC can be extended to yield ratio scale numbers -> Comrey Constant Sum
PC
•Other Methods…
44. 44
DASC Method
DASC: Double Anchor Successive Categories Method
Useful when you have:
• Lots of samples (up to ≈ 40)
• Need for an interval number for ANOVA analysis (DOE’s, etc.)
•Combination of Successive Categories and Magnitude Estimation
Method has three steps:
• Step 1: Organize samples into piles
• Step 2: Score each pile
• Step 3: Define score where acceptability begins
45. 45
Step 1: Organize the Piles
Step 1: Organize samples into piles, with worst on left, best on right
• How many piles? Driven by # samples to be tested.
• 3 piles x 3 samples/pile ≈ 9 samples max
• 4 piles x 4 samples/pile ≈ 16 samples max
• 5 piles x 5 samples/pile ≈ 25 samples max, etc.
• Only requirement is to end up with correct # piles sorted left to right
• Replicates are always included to get measure of experimental error
• Using piles is just a quick (and easier) way to score many samples
46. 46
Step 2: Score Each Pile
Step 2: Score each pile
• Double Anchors
• Score = 1: Worst possible _____________
• Score = 100: Best possible _____________
• If piles are similar to each other, scores should be similar
• If piles not similar, scores should reflect that
• Scores do not have to fill entire 1-100 score range
• Scores do not have to be evenly spaced
• It is unlikely that “real” samples will hit either anchor; if so, clarify instructions
• Every sample within a pile inherits that pile’s score
47. 47
Step 3: Get a Score for Overall Acceptability
Step 3: Get Score for Acceptability
• Starting at Score = 1 and moving to the right, find the hypothetical Score
where acceptability goes from unacceptable to acceptable for each observer
• Score does not have to equal a pile score
• Value is used to decide whether each sample in the test is unacceptable or
acceptable
• Example: If Score for Acceptability = 65, then:
• Piles < 65 are unacceptable
• Piles ≥ 65 are acceptable
48. 48
Sample Scoring via EXCEL Template
•Tester input in yellow boxes
• Observer name entered
• Piles are scored and entered
under Magnitude
• Acceptability Cutoff = 81
• Every sample under left most pile
(80) gets a “1”
• Samples 7, 13, 15, 17, 19, 21,
23
• Etc.
• Number of samples recorded is
calculated both down and over (see
“Checks”)
•Data ready for analysis SW
Category: 1 2 3 4 5
Observer Magnitude: 80 89 92 95 100
Michelle Total Found 35 7 11 8 2 7 Checks
Category: 1 2 3 4 5
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
Acceptability Cutoff 81 9 1 1
10 1 1
11 1 1
12 1 1
13 1 1
14 1 1
15 1 1
16 1 1
17 1 1
18 1 1
19 1 1
20 1 1
21 1 1
22 1 1
23 1 1
24 1 1
25 1 1
26 1 1
27 1 1
28 1 1
29 1 1
30 1 1
31 1 1
32 1 1
33 1 1
34 1 1
35 1 1
NeedtofindONECategoryperSample.
SampleIDSampleID
Proctor's Notes:
1) Enter the Observer's ID above.
2) Pick up the samples from
Category 1. Enter the number 1
(one) at the grid location defined by
the row (Sample Number) and
column (Category) in which the
sample is found.
3) Repeat step 2 with samples
from each of the successive
categories.
4) Have the Observer assign a
Magnitude (relating to "goodness")
for each Category. Magnitude
must be between 1 & 100.
Total Found must equal the number of
Samples.
The value in the row of "Checks" must
equal the number of samples found in
each Category.
All "Check Cells" are formatted in Bold
Red when the data are not correct.
49. 49
Data Ready for Analysis Software
•First compute mean scores from raw data
• Run ANOVA model – fixed by DOE design
• Linear, response surface, D-Optimal, whatever
• Is model a decent one?
• R2
& R2
(adj) – how much of total variation explained by model
• Go for parsimony – no insignificant factors included
• RMSE reasonable?
• If model decent, save prediction formula
• Compare Predicted vs. Actual
• Make significance tests
• Orthogonal contrasts/comparisons
50. 50
ANOVA Analysis
•Summary of fit, whole model ANOVA, Effect tests, etc.
0
25
50
75
100
Mean(Score)
0 25 50 75 100
P (Mean Score)
Bivariate Normal Ellipse P=0.950
Linear Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.926
0.915
3.864
51.254
24.000
Summary of Fit
Model
Error
C. Total
Source
3
20
23
DF
3726.3512
298.6083
4024.9596
Sum of Squares
1242.12
14.93
Mean Square
83.1937
F Ratio
<.0001*
Prob > F
Analysis of Variance
mode
image
Source
2
1
Nparm
2
1
DF
2802.5508
923.8004
Sum of Squares
93.8537
61.8737
F Ratio
<.0001*
<.0001*
Prob > F
Effect Tests
51. 51
Analysis: Logistic Regression
•Raw score data used as regressor (X
value)
• Binary yes/no used as response (Y
value)
• Result, if decent, is a probability
prediction
• When score = xx, probability of
acceptance = yy
• Essentially, this is a calibration of the
score scale
• Area under blue line = prob. of NOT
being acceptable
• Area above blue line = prob. of being
acceptable
Acceptable
0.00
0.25
0.50
0.75
1.00
0 25 50 75 100
Score
no
yes
52. 52
Variance Components Study using DASC Methodology
Photo Media Artifact
There was an unwanted artifact when printing on photo media
Popular opinion dictated that it was a media related problem
Set up a study where 4 medias were printed across 3 randomly selected
printers with a variety of image content
Magnitude
20
30
40
50
60
70
80
90
B C M B C M B C M B C M
Do Pr Sp St
Printer w ithin Media
%ofTotal
40%
7%
42%
12%
Component
Component
Media Media*Printer Printer
Within
53. 53
Logistic Regression of Photo Media Artifact Study
Point of Subjective Equality ~72 on the Magnitude scale
JND (as determined by the 75th
minus the 50th
percentiles) ~5 units
Printer-to-printer differences could be as extreme as 40 Magnitude units
54. 54
Comparison of Psychometric Methods
Criteria
Rank
Order
Category
Rating
Paired
Compr.
Rating
Scales
Magnitude
Estimation
Comrey
Constant
Sum PC
DASC
Indirect /
Direct
Indirect Indirect Indirect Direct Direct Direct Hybrid
#
Observers
Mod High Low High/Mod Mod Low Mod/High
# Presents n n (n(n-1))/2 n n (n(n-1))/2 n
Range of
Stimuli
Mod Wide Small Wide Mod Small Small/Mod
Numerical
Scale
Nominal /
Interval
Interval Interval Interval Ratio Ratio Interval
Ability to
Discriminate
High Mod High Mod Mod High Mod
Scale
Variability
Mod High Mod High High Mod/High Mod
Training
Reqd.
No No No No Yes Yes No
Test
Duration
Short Short Mod Short Mod/High Mod Mod
55. 55
Acknowledgements
Kodak Contributors:
Dana Aultman*
Randy Dumas
Wayne Richard
Steve Billow
References:
Class notes from Empirical Modeling (CQAS 0875), Barker, Thomas
Engeldrum, Peter G. (2000), Psychometric Scaling, a Toolkit for Imaging Systems
Development, Imcotek Press
Miller,M , IST: Psychophysics, Eastman Kodak Internal Reference
Lodge, Milton (1981), Magnitude Scaling: Quantitative Measurement of Opinions, Sage
Publications
*Retiree as of 12/07
56. 56
Summary
High Level Overview of Psychometric Theory
Experimental Conditions
Review of Different Methods (Including Trade-Offs)
•Direct
•Indirect
•Hybrid
Reviewed Some Practical Examples
Editor's Notes
Trying to apply scientific approach to studying relationships between human perception and measured physical characteristics
This leads to the development of the psychometric function
Sometimes called the frequency-of-seeing curve
Cumulative density function
Visual evaluations are not always compared to physical measurements
Can be ‘feel’ or photo-like