SlideShare a Scribd company logo
1 of 57
Download to read offline
Psychometric Studies in the
Development of the Kodak AiO
Inkjet Printer
David Lee
Inkjet Systems Division
Eastman Kodak Company
2
The Kodak Value Proposition
… leading to three key vectors of differentiation
“Implementing Kodak’s Digital Imaging System Strategy”
IMAGE QUALITY
ONE BUTTON
SIMPLE & FAST
Real
Kodak
Pictures
R
E
A
LLY
E
A
S
Y
REAL
VALUE
Look, feel, durability, and fade-
resistance of Kodak pictures.
Affordable pictures
Simple, convenient photos and
text, with and without a PC
Kodak
Quality
3
Agenda
Introduction / Overview
• What is psychometrics?
• Thresholds, Just-Noticeable Differences and the Psychometric Function
• Why do we need it?
• Class example
Experimental Considerations
• Unbiased
• Environmental considerations (i.e., Lighting)
• Issues & Challenges
– Number of samples
Technical Aspects with Examples
• Measurement/Numerical Scales
• Selected scaling methodologies and data analysis
– Direct versus indirect methods
– Hybrid DASC Method
Summary/Acknowledgements
4
What is Psychometrics?
What is psychophysics/psychometrics? I’m not sure!

•The branch of psychology concerned with quantitative relations between
physical stimuli and their psychological effects
• Scientific approach to studying relationships between human perception
(psychology) and measured physical characteristics
human perception physical measurements
5
“What is the value of the characteristic that is just visible or just detectable?”
“What is the minimum value of the characteristic that is seen as different from a
reference or standard?”
For example, if we present a stimulus varying in intensity, the observer
responds with ‘Yes’ or ‘No’ indicating they see/feel/observe the difference.
•Estimate the probability or likelihood of ‘successful’ outcome
• Think of shining a light in a dark room starting at a very low intensity
• The voltage is slowly increased until the observer notices a difference
• Some people will be more sensitive and notice a difference earlier while
others won’t see a difference until much more power
Thresholds and Just-Noticeable Differences (jnd)
Two Key Questions…
6
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
Proportion
5 6 7 8 9 10 11 12 13
Volts
Psychometric Function
psychology
physics
Stimulus or
Absolute
Threshold
Just
Noticeable
Difference
7
Why do we need psychophysics?
Ties the Voice of the Customer (VOC) to…
Marketing:
•Benchmarking
• Where do we stand wrt to competitors?
•Program Goals
• What’s the ‘best’ level of _______
Product Design & Development:
•Design Trade-offs:
• Print quality versus print speed
• Tonescale (Color Gamut) versus ink laydown
•Troubleshooting
• How noticeable is an artifact?
•Monitor Development Progress
• Where we are with things?
• Has the perception of a characteristic changed over time?
8
Let’s try an example
or two…
9
Remove the two (2) reference anchors from the envelope labeled “anchors” and
place the darker one to your left and the lighter one to your right. As marked
towards the bottoms, these anchors represent lightness levels of zero (0) and
100, respectively. Next, remove the seven (7) sample patches from the envelope
labeled “numerical rating samples” and place them face-down to your side. Your
task is to compare the lightness of each sample to the two anchors, one sample at
a time, and select a number between zero (0) and 100 that best represents it’s
lightness level. After you make each of your decisions, place the sample to the
side, face down, and record the lightness value you chose next to the matching
letter ID on the sample score-sheet. Any questions?
low anchor = 0 high anchor = 100
Experimental Instructions for Two-Anchor Numerical
Rating Lightness Experiment
test sample
(evaluated one at a time)
Ref.: samples J, E, L, S, C, Y, D
10
Experimental Instructions for Two-Anchor Numerical Rating
Lightness Experiment (Continued)
Double Anchor Numerical Rating
Score Sheet
Sample ID Score
J
E
L
S
C
Y
D
11
Discussion from Double-Anchor Numerical
Rating (Lightness) Experiment
Lay your samples out in front of you based on the lightness scores that you assigned to each sample.
Are your numerical scores generally consistent over the wide range of lightness?
Can you use your scores to reliably detect small differences between samples? Are all of your scores consistent with relative
lightnesses when you directly compare samples (e.g. samples ‘Y’ and ‘D’)?
Would you consider this type of methodology better for:
A) wide range scaling at lower resolutions?
B) high resolution of small differences between samples?
Do you believe that this type of method would be practical for relatively large numbers of samples (e.g. handling, difficulty of
evaluation, etc.)?
0 10 20 30 40 50 60 70 80 90 100
12
Numerical Ratings: Advantages/Limitations
The task of numerical rating is relatively simple for observers
This method is compatible with physical or imaginary anchors.
•Double anchor of zero to 100
The numerical rating method is effective for lower accuracy/precision
characterizations over wide ranges of perception.
It is also compatible with larger numbers of samples.
The numerical rating method is generally not effective for detecting small
differences between samples. This is because of our inherent limitations in
estimating absolute magnitudes over wide ranges (Engeldrum).
Data are approximately linear (interval or ratio scales)
13
Experimental Instructions for
Rank Order Lightness Experiment
Please remove the six (6) samples (B, O, R, H, A & M) from the envelope
marked “rank order samples”. Comparing all six (6) samples to each other,
your task is to arrange them in order of increasing lightness with the darkest
sample towards your left and the lightest towards your right. Once you are
satisfied with your rank ordering of lightness, record the order of your samples
on the rank order score-sheet that is provided and position your samples in that
order on the sheet. Any questions?
darkest lightest
14
Rank Order Score Sheet
Darkest Lightest
15
Discussion from Rank Order Lightness Experiment
Do you believe that this is an effective method for detecting small lightness differences
between samples?
Do you believe that this would be an effective method for scaling samples over wide
ranges?
Could this method be used with a large number of samples?
What do you believe are the strengths and weakness compared to the numerical rating
method?
What would you conclude about the relative lightness between two given samples where
50% of the observers ranked sample A lighter than B and 50% ranked sample B lighter
than A?
If 75% of observers ranked sample A lighter than B and 90% of observers ranked sample
B lighter than C, would you conclude that there is a larger lightness difference between A-
B or B-C?
CA B
A B
16
Rank Order Method: Advantages/Limitations
The rank order “confusion” method is very effective for detecting small
differences between samples. This is based on our innate competencies in
making close comparative judgments. It is also effective for evaluating multi-
dimensional trade-offs such as overall IQ.
There are practical limits to numbers of samples (e.g. 8, 10, 15?)
Rank ordering tends to lose its effectiveness over relatively wide perceptual
ranges.
Rank order methods afford direct assess to “preference”, “acceptability”, and
“noticeability” data.
We cannot assume that raw rank order data are linear; they are ordinal…
We can leverage “observer confusion” to derive interval scales using
Thurstone’s “law of comparative judgments”
17
Relative Lightnesses of Sample Patches
ID Relative Lightness RGB
Black 0 0
J 5 13
E 20 51
L 45 115
S 58 148
C 68 173
O 80 204
R 81 207
H 82 209
A 83 212
M 84 214
Y 84 214
D 85 217
B 94 240
White 100 255
18
Experimental Considerations
•In addition to the issues of conducting any designed experiment you must be
aware of the following:
• Follow the same protocol/methodology for each observer
– Keep the test unbiased!
• Issues & Challenges
– Range and distribution of the characteristic
» Number of samples versus objective
– Image content
» 1st
party versus 3rd
party images
» Make sure the content spans the range of interest
– Number & type of observers
» Expert versus average users
– Sample preparation, handling and maintenance
» Sample borders
19
Environmental Considerations
Presenting & viewing the samples
• Presentation Mode
– All-at-once versus one-at-a-time
– Viewing Distance
– Environmental factors
» Make sure the observer is comfortable
• Scene lighting
tungsten fluorescentdaylight
20
Classification of Scale Types
Scale Principle/
Definition
Example Arithmetic
Operations
Statistical
Operations
Nominal Identity
Classes
Male vs.
Female
Counting χ2
Analysis
Binomial
Ordinal Order Order of
finish in a
race
Greater than
or less than
operations.
median, IRQ,
Nonparametric
Interval Ordering is
known wrt to
an attribute
Temp (°F)
Most
personality
measures.
Addition and
subtraction of
scale values.
Mean
ANOVA
Multivariate
Ratio There is a
rationale
zero point for
the scale.
Temp (°K)
Distance
Multiplication
and division
of scale
values.
Geometric
Mean
ANOVA
21
Direct versus Indirect Scaling
Direct Methods
Judgments of the magnitude are made in direct scaling through directly
assigning numbers that correspond to the level of the stimuli
•Judgments are often made relative to a reference or anchor
Numeric estimation provides a very simple and easy method toward
constructing interval scale estimates
•Reliability of the estimate
Methods include
• Two-Anchor Grayscale Example
• Line or Ruler Based Systems
• Comrey Constant Sum Paired Comparison
• Magnitude Estimation
Observers are asked to assign a value to each stimulus.
Each stimulus is observed only once and one-at-a-time
•Tendency for a ‘learning curve’ effect
22
Stevens’ Power Law
Where:
Ψ=subjective magnitude, one’s impression of of the stimulus magnitude
R=the actual, but often unknown, magnitude of the response
S=physical magnitude of the stimulus
b=characteristic growth exponent
k=constant of proportionality/scaling constant
b
kSR ==Ψ
)log()log(*)log( kSbR +=
23
Exponents of the Power Law
 Exponents are interpreted as a
‘signature’ characterizing the
impression of that specific
sensation via numeric estimation
• Reported correlations on the
order of 99%
 Critics argued that solely relying on
numeric estimates made it
impossible to:
• Verify the Power Law
• Confirm the characteristic
exponent
• Prove superiority of magnitude
scaling techniques
24
Cross Modality Matching
Assumptions
•IF the coefficients in the previous
table truly do represent the
characteristic exponent, then any of
these stimuli could be used in lieu of
numeric estimation
•Results support the characteristic
exponents shown in the earlier table
•Responses other than numbers can
be successfully utilized
25
Here’s Where the Rubber Hits the Road…
Any two quantitative response measures with established characteristic
exponents can be used to judge sensory stimuli (i.e. Text Quality)
‘Validity’ is determined by comparing the theoretical to empirical ratios
between the two response measures
Yields ratio-scale estimates of the sensory stimuli suitable for statistical
analyses (i.e. ANOVA & Hypothesis Testing)
S ψ
R1
R2
)log()log(*)log( 111 kSbR +=
)log()log(*)log( 222 kSbR +=
26
Text Quality Example
Finally…
Problem:
•Define the minimum acceptable text quality for draft mode printing
•Quantify the effect of ‘Factor A’ and ‘Factor B’ on perceived Text Quality
•Understand trade-offs
• Map out response as a function of input factors
Protocol:
•Test Samples were generated by simulation
•Factor settings data taken from competitive assessments
•Factors A & B were the only settings changed between simulations
•10x Scaling – Subjects were seated 5’ from samples for a 6-10” equivalent
viewing distance.
27
General Procedure
Calibration Phase
•Subjects are given instruction and
practice in the use of two quantitative
response methods
•Allows for determining if subjects
judgments are proportional and to
what degree
• Can be used to estimate the
effects of bias and provide
correction
Scaling Phase:
•Repeat same procedure only now
with stimuli of interest
28
Survey Calibration Instructions
29
Bias Correction / Results of the Calibration Phase
For numerical estimation of a specified line length
Similar results seen with line length estimation
Can be used to test if individuals are utilizing a proportional
judgments that track across the continuum
y = 0.8706x + 0.2605
R2
= 0.9956
0
0.5
1
1.5
2
2.5
0.0000 0.5000 1.0000 1.5000 2.0000 2.5000
Log(Actual Line Length)
Log(NumericEstimate)
Linear (Theoretical)
Linear (Calibration)
30
Calculation of Ratio Scale Values
Under ideal conditions, NE1.0
=LP1.0
To compute magnitude scale values,
ψ=(NE1.0
*LP1.0
)0.5
To correct for bias,
ψ=(NE1/n
*LP1/n
)0.5
where n=exponent as determined from the calibration exercise
31
Comparison of Text Quality Responses
After Bias Correction
Yellow lines represent minimum acceptable Text quality as determined by the
observers
Corrected Response Data from 20 Samples
y = 0.933x + 0.1421
R
2
= 0.8442
1.2
1.4
1.6
1.8
2.0
2.2
1.2 1.4 1.6 1.8 2.0 2.2
Corrected[Log(Numerical Estimate)]
Corrected[Log(ProducedLineLength)]
Theoretical Fit
Corrected
Response
Linear
(Corrected
Response)
32
Variability Chart for the Composite Metric
The strong effect of Factor A is clearly seen
The variability of the replicates was very close
CompositeMetric
1.5
1.6
1.7
1.8
1.9
2.0
2.1
-1.00
-0.33
0.33
1.00
-1.00
-0.33
0.33
1.00
-1.00
-0.33
0.33
1.00
-1.00
-0.33
0.33
1.00
-1.00 -0.33 0.33 1.00
Factor B w ithin Factor A
33
ANOVA Results
1.5
1.6
1.7
1.8
1.9
2
2.1
Composite
Metric
2.043306
±0.036828
00.250.50.751
Desirability
0.890101
-1.0
-0.5
0.0
0.5
1.0
1
Factor A
-1.0
-0.5
0.0
0.5
1.0
0.16757
Factor B
0
0.25
0.5
0.75
1
Desirability
Summary of Fit
RSquare 0.970998
RSquare Adj 0.96064
Root Mean Square Error 0.033764
Mean of Response 1.831543
Observations (or Sum Wgts) 20
34
Comparison of Text Quality Responses
After Bias Correction
In addition to understanding the factors, the minimum acceptable levels for
Factors A & B are also understood!
Corrected Response Data from 20 Samples
y = 0.933x + 0.1421
R
2
= 0.8442
1.2
1.4
1.6
1.8
2.0
2.2
1.2 1.4 1.6 1.8 2.0 2.2
Corrected[Log(Numerical Estimate)]
Corrected[Log(ProducedLineLength)]
Theoretical Fit
Corrected
Response
Linear
(Corrected
Response)
35
Direct Scaling Summary
Able to yield ratio scale
numbers for subsequent
analyses
Considerable effort
Magnitude estimation
required participant training
as well as calibration steps
Some participant fatigue
was evident
Some participants were not
following proportional
judgments
36
Direct versus Indirect Scaling
Indirect Methods
Methods used to create a psychological scale in which information regarding an
individuals ability to detect or discriminate stimuli is used to infer the magnitude
of a sensation
Typically easier to conduct and communicate to participants
• Magnitude Estimation required training the participants as well as
conducting a calibration step
•Data analysis requires additional theory for interval scale generation
• Thurstone’s Law of Comparative Judgment
Methods include:
•Rank Ordering
•Successive Categories
•Category Scaling
•Paired Comparison
37
Rank Ordering and Successive Categories
Rank Ordering
Quick and easy but loose information and non-uniformity of distance between
ranks
Successive Categories
Difficult to keep equal distances between categories
Information may be lost due to limited resolution of the categories
38
Thurstone’s Law of Comparative Judgment
Each stimulus is perceived through a ‘discriminal process’ that has some value
An observer may vary his or her response for the same stimulus when viewed
multiple time
•Resulting in a ‘confusion’ level
By presenting the stimulus multiple times to either the same observer and/or
over different observers, a frequency distribution can be determined over the
psychological continuum
•Psychometric function
Thurstone assumed that the distribution followed a Gaussian distribution with a
specific mean and standard deviation, which he called scale value and
discriminal spread
•Thurstone describes 6 cases with various assumptions
•Case V assumes no correlation between the stimuli and constant variance
across the different stimuli
39
Thurstone’s Law of Comparative Judgment
Case 5
14121086420
0.4
0.3
0.2
0.1
0.0
Psychological Response Cont inuum
FrequencyofResponse
3
7
11
Mean
Example of 3 Stimuli
40
Example of Case 5 Applied to a Category Rating Experiment
Suppose we run a Categorical Rating experiment with the following 5 categories
Assume that we had 7 samples and 15 participants
very bad bad
niether
bad nor
good good very good
Category
very bad bad
niether
bad nor
good good very good
sample 1 2 1 8 3 1
sample 2 1 1 4 8 1
sample 3 6 1 6 1 1
sample 4 2 1 4 5 3
sample 5 6 2 4 1 2
sample 6 4 1 7 1 2
sample 7 1 1 1 3 9
Category
very bad bad
niether
bad nor
good good very good
sample 1 2 3 11 14 15
sample 2 1 2 6 14 15
sample 3 6 7 13 14 15
sample 4 2 3 7 12 15
sample 5 6 8 12 13 15
sample 6 4 5 12 13 15
sample 7 1 2 3 6 15
Convert scoring matrix to
a cumulative count matrix
41
Example of Case 5 (Continued)
Convert to a cumulative proportion matrix
Convert to a z-score, calculate means and standard deviations with some additional scaling
Perform similar operations for the ratings (Very Bad … Very Good) to determine boundaries
Category
very bad bad
niether
bad nor
good good very good
sample 1 0.133 0.200 0.733 0.933 1.000
sample 2 0.067 0.133 0.400 0.933 1.000
sample 3 0.400 0.467 0.867 0.933 1.000
sample 4 0.133 0.200 0.467 0.800 1.000
sample 5 0.400 0.533 0.800 0.867 1.000
sample 6 0.267 0.333 0.800 0.867 1.000
sample 7 0.067 0.133 0.200 0.400 1.000
z-score matrix: mean std-dev
sample 1 -1.130 -0.840 0.610 1.480 0.030 1.230
sample 2 -1.560 -1.130 -0.250 1.480 -0.365 1.345
sample 3 -0.250 -0.100 1.080 1.480 0.553 0.858
sample 4 -1.130 -0.840 -0.100 0.840 -0.308 0.879
sample 5 -0.250 0.080 0.840 1.080 0.438 0.626
sample 6 -0.640 -0.440 0.840 1.080 0.210 0.875
sample 7 -1.560 -1.130 -0.840 -0.250 -0.945 0.550
42
Case 5 (Continued)
Interval scales estimates of the psychological response are calculated
Boundary limits are also established for the Rating levels
This method is very useful for competitive assessments as well as tracking
progress over time
Relative Sample Scores
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
sample1
sample2
sample3
sample4
sample5
sample6
sample7
sample ID
relativescalevalue
Upper Category Boundaries
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
verybad
bad
nietherbadnorgood
good
category
upperboundary
43
Indirect Scaling Summary
Rank ordering and successive categories are easy tests to administer
•Participants don't require special training
• Other than the basic instructions for performing the task at ahnd
•Provide limited information
Thurstone's Law of Comparative (Categorical) Judgment provides a means to
determine a psychological scale from caparison (category) judgments
• Requires assumptions about the distribution of both the psychological
response as well as the category boundaries
• Yields interval scale numbers
Paired comparisons (PC) can utilized when the number of stimuli is fairly low
and a fine level of distinction is necessary
•PC can be extended to yield ratio scale numbers -> Comrey Constant Sum
PC
•Other Methods…
44
DASC Method
DASC: Double Anchor Successive Categories Method
Useful when you have:
• Lots of samples (up to ≈ 40)
• Need for an interval number for ANOVA analysis (DOE’s, etc.)
•Combination of Successive Categories and Magnitude Estimation
Method has three steps:
• Step 1: Organize samples into piles
• Step 2: Score each pile
• Step 3: Define score where acceptability begins
45
Step 1: Organize the Piles
Step 1: Organize samples into piles, with worst on left, best on right
• How many piles? Driven by # samples to be tested.
• 3 piles x 3 samples/pile ≈ 9 samples max
• 4 piles x 4 samples/pile ≈ 16 samples max
• 5 piles x 5 samples/pile ≈ 25 samples max, etc.
• Only requirement is to end up with correct # piles sorted left to right
• Replicates are always included to get measure of experimental error
• Using piles is just a quick (and easier) way to score many samples
46
Step 2: Score Each Pile
Step 2: Score each pile
• Double Anchors
• Score = 1: Worst possible _____________
• Score = 100: Best possible _____________
• If piles are similar to each other, scores should be similar
• If piles not similar, scores should reflect that
• Scores do not have to fill entire 1-100 score range
• Scores do not have to be evenly spaced
• It is unlikely that “real” samples will hit either anchor; if so, clarify instructions
• Every sample within a pile inherits that pile’s score
47
Step 3: Get a Score for Overall Acceptability
Step 3: Get Score for Acceptability
• Starting at Score = 1 and moving to the right, find the hypothetical Score
where acceptability goes from unacceptable to acceptable for each observer
• Score does not have to equal a pile score
• Value is used to decide whether each sample in the test is unacceptable or
acceptable
• Example: If Score for Acceptability = 65, then:
• Piles < 65 are unacceptable
• Piles ≥ 65 are acceptable
48
Sample Scoring via EXCEL Template
•Tester input in yellow boxes
• Observer name entered
• Piles are scored and entered
under Magnitude
• Acceptability Cutoff = 81
• Every sample under left most pile
(80) gets a “1”
• Samples 7, 13, 15, 17, 19, 21,
23
• Etc.
• Number of samples recorded is
calculated both down and over (see
“Checks”)
•Data ready for analysis SW
Category: 1 2 3 4 5
Observer Magnitude: 80 89 92 95 100
Michelle Total Found 35 7 11 8 2 7 Checks
Category: 1 2 3 4 5
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
Acceptability Cutoff 81 9 1 1
10 1 1
11 1 1
12 1 1
13 1 1
14 1 1
15 1 1
16 1 1
17 1 1
18 1 1
19 1 1
20 1 1
21 1 1
22 1 1
23 1 1
24 1 1
25 1 1
26 1 1
27 1 1
28 1 1
29 1 1
30 1 1
31 1 1
32 1 1
33 1 1
34 1 1
35 1 1
NeedtofindONECategoryperSample.
SampleIDSampleID
Proctor's Notes:
1) Enter the Observer's ID above.
2) Pick up the samples from
Category 1. Enter the number 1
(one) at the grid location defined by
the row (Sample Number) and
column (Category) in which the
sample is found.
3) Repeat step 2 with samples
from each of the successive
categories.
4) Have the Observer assign a
Magnitude (relating to "goodness")
for each Category. Magnitude
must be between 1 & 100.
Total Found must equal the number of
Samples.
The value in the row of "Checks" must
equal the number of samples found in
each Category.
All "Check Cells" are formatted in Bold
Red when the data are not correct.
49
Data Ready for Analysis Software
•First compute mean scores from raw data
• Run ANOVA model – fixed by DOE design
• Linear, response surface, D-Optimal, whatever
• Is model a decent one?
• R2
& R2
(adj) – how much of total variation explained by model
• Go for parsimony – no insignificant factors included
• RMSE reasonable?
• If model decent, save prediction formula
• Compare Predicted vs. Actual
• Make significance tests
• Orthogonal contrasts/comparisons
50
ANOVA Analysis
•Summary of fit, whole model ANOVA, Effect tests, etc.
0
25
50
75
100
Mean(Score)
0 25 50 75 100
P (Mean Score)
Bivariate Normal Ellipse P=0.950
Linear Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.926
0.915
3.864
51.254
24.000
Summary of Fit
Model
Error
C. Total
Source
3
20
23
DF
3726.3512
298.6083
4024.9596
Sum of Squares
1242.12
14.93
Mean Square
83.1937
F Ratio
<.0001*
Prob > F
Analysis of Variance
mode
image
Source
2
1
Nparm
2
1
DF
2802.5508
923.8004
Sum of Squares
93.8537
61.8737
F Ratio
<.0001*
<.0001*
Prob > F
Effect Tests
51
Analysis: Logistic Regression
•Raw score data used as regressor (X
value)
• Binary yes/no used as response (Y
value)
• Result, if decent, is a probability
prediction
• When score = xx, probability of
acceptance = yy
• Essentially, this is a calibration of the
score scale
• Area under blue line = prob. of NOT
being acceptable
• Area above blue line = prob. of being
acceptable
Acceptable
0.00
0.25
0.50
0.75
1.00
0 25 50 75 100
Score
no
yes
52
Variance Components Study using DASC Methodology
Photo Media Artifact
There was an unwanted artifact when printing on photo media
Popular opinion dictated that it was a media related problem
Set up a study where 4 medias were printed across 3 randomly selected
printers with a variety of image content
Magnitude
20
30
40
50
60
70
80
90
B C M B C M B C M B C M
Do Pr Sp St
Printer w ithin Media
%ofTotal
40%
7%
42%
12%
Component
Component
Media Media*Printer Printer
Within
53
Logistic Regression of Photo Media Artifact Study
Point of Subjective Equality ~72 on the Magnitude scale
JND (as determined by the 75th
minus the 50th
percentiles) ~5 units
Printer-to-printer differences could be as extreme as 40 Magnitude units
54
Comparison of Psychometric Methods
Criteria
Rank
Order
Category
Rating
Paired
Compr.
Rating
Scales
Magnitude
Estimation
Comrey
Constant
Sum PC
DASC
Indirect /
Direct
Indirect Indirect Indirect Direct Direct Direct Hybrid
#
Observers
Mod High Low High/Mod Mod Low Mod/High
# Presents n n (n(n-1))/2 n n (n(n-1))/2 n
Range of
Stimuli
Mod Wide Small Wide Mod Small Small/Mod
Numerical
Scale
Nominal /
Interval
Interval Interval Interval Ratio Ratio Interval
Ability to
Discriminate
High Mod High Mod Mod High Mod
Scale
Variability
Mod High Mod High High Mod/High Mod
Training
Reqd.
No No No No Yes Yes No
Test
Duration
Short Short Mod Short Mod/High Mod Mod
55
Acknowledgements
Kodak Contributors:
Dana Aultman*
Randy Dumas
Wayne Richard
Steve Billow
References:
Class notes from Empirical Modeling (CQAS 0875), Barker, Thomas
Engeldrum, Peter G. (2000), Psychometric Scaling, a Toolkit for Imaging Systems
Development, Imcotek Press
Miller,M , IST: Psychophysics, Eastman Kodak Internal Reference
Lodge, Milton (1981), Magnitude Scaling: Quantitative Measurement of Opinions, Sage
Publications
*Retiree as of 12/07
56
Summary
High Level Overview of Psychometric Theory
Experimental Conditions
Review of Different Methods (Including Trade-Offs)
•Direct
•Indirect
•Hybrid
Reviewed Some Practical Examples
Psychometric Studies in the Development of an Inkjet Printer

More Related Content

What's hot

Maximum Difference Statistical Analysis For Determining Political Campaign Me...
Maximum Difference Statistical Analysis For Determining Political Campaign Me...Maximum Difference Statistical Analysis For Determining Political Campaign Me...
Maximum Difference Statistical Analysis For Determining Political Campaign Me...Michael Lieberman
 
Lecture 5 concept appraisal and selection
Lecture 5   concept appraisal and selectionLecture 5   concept appraisal and selection
Lecture 5 concept appraisal and selectionandibrains
 
Design and Application of Experiments and User Studies
Design and Application of Experiments and User StudiesDesign and Application of Experiments and User Studies
Design and Application of Experiments and User StudiesVictor Adriel Oliveira
 
Brm kbs-1
Brm kbs-1Brm kbs-1
Brm kbs-1kaberi1
 
Business Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisBusiness Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisAhsan Khan Eco (Superior College)
 
Bj research session 9 analysing quantitative
Bj research session 9 analysing quantitativeBj research session 9 analysing quantitative
Bj research session 9 analysing quantitativeIan Cammack
 
Questionary design
Questionary designQuestionary design
Questionary designPintu Sheel
 
Empirical research methods for software engineering
Empirical research methods for software engineeringEmpirical research methods for software engineering
Empirical research methods for software engineeringsarfraznawaz
 
Measurement and scaling noncomparative scaling technique
Measurement and scaling noncomparative scaling techniqueMeasurement and scaling noncomparative scaling technique
Measurement and scaling noncomparative scaling techniqueRohit Kumar
 
[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineeringIvano Malavolta
 
Empirical Software Engineering for Software Environments - University of Cali...
Empirical Software Engineering for Software Environments - University of Cali...Empirical Software Engineering for Software Environments - University of Cali...
Empirical Software Engineering for Software Environments - University of Cali...Marco Aurelio Gerosa
 
DIY Max-Diff webinar slides
DIY Max-Diff webinar slidesDIY Max-Diff webinar slides
DIY Max-Diff webinar slidesDisplayr
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviewsmaranlar
 
Expanding our Testing Horizons
Expanding our Testing HorizonsExpanding our Testing Horizons
Expanding our Testing HorizonsMark Micallef
 
Games Design 2 - Lecture 9 - User Evaluation
Games Design 2 - Lecture 9 - User EvaluationGames Design 2 - Lecture 9 - User Evaluation
Games Design 2 - Lecture 9 - User EvaluationDavid Farrell
 
IRJET - Characterizing Products’ Outcome by Sentiment Analysis and Predicting...
IRJET - Characterizing Products’ Outcome by Sentiment Analysis and Predicting...IRJET - Characterizing Products’ Outcome by Sentiment Analysis and Predicting...
IRJET - Characterizing Products’ Outcome by Sentiment Analysis and Predicting...IRJET Journal
 

What's hot (20)

Maximum Difference Statistical Analysis For Determining Political Campaign Me...
Maximum Difference Statistical Analysis For Determining Political Campaign Me...Maximum Difference Statistical Analysis For Determining Political Campaign Me...
Maximum Difference Statistical Analysis For Determining Political Campaign Me...
 
Lecture 5 concept appraisal and selection
Lecture 5   concept appraisal and selectionLecture 5   concept appraisal and selection
Lecture 5 concept appraisal and selection
 
Malhotra05
Malhotra05Malhotra05
Malhotra05
 
Malhotra06
Malhotra06Malhotra06
Malhotra06
 
Malhotra03
Malhotra03Malhotra03
Malhotra03
 
Design and Application of Experiments and User Studies
Design and Application of Experiments and User StudiesDesign and Application of Experiments and User Studies
Design and Application of Experiments and User Studies
 
Brm kbs-1
Brm kbs-1Brm kbs-1
Brm kbs-1
 
Business Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisBusiness Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysis
 
Bj research session 9 analysing quantitative
Bj research session 9 analysing quantitativeBj research session 9 analysing quantitative
Bj research session 9 analysing quantitative
 
Questionary design
Questionary designQuestionary design
Questionary design
 
Malhotra10
Malhotra10Malhotra10
Malhotra10
 
Empirical research methods for software engineering
Empirical research methods for software engineeringEmpirical research methods for software engineering
Empirical research methods for software engineering
 
Measurement and scaling noncomparative scaling technique
Measurement and scaling noncomparative scaling techniqueMeasurement and scaling noncomparative scaling technique
Measurement and scaling noncomparative scaling technique
 
[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering
 
Empirical Software Engineering for Software Environments - University of Cali...
Empirical Software Engineering for Software Environments - University of Cali...Empirical Software Engineering for Software Environments - University of Cali...
Empirical Software Engineering for Software Environments - University of Cali...
 
DIY Max-Diff webinar slides
DIY Max-Diff webinar slidesDIY Max-Diff webinar slides
DIY Max-Diff webinar slides
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviews
 
Expanding our Testing Horizons
Expanding our Testing HorizonsExpanding our Testing Horizons
Expanding our Testing Horizons
 
Games Design 2 - Lecture 9 - User Evaluation
Games Design 2 - Lecture 9 - User EvaluationGames Design 2 - Lecture 9 - User Evaluation
Games Design 2 - Lecture 9 - User Evaluation
 
IRJET - Characterizing Products’ Outcome by Sentiment Analysis and Predicting...
IRJET - Characterizing Products’ Outcome by Sentiment Analysis and Predicting...IRJET - Characterizing Products’ Outcome by Sentiment Analysis and Predicting...
IRJET - Characterizing Products’ Outcome by Sentiment Analysis and Predicting...
 

Similar to Psychometric Studies in the Development of an Inkjet Printer

computer application in pharmaceutical research
computer application in pharmaceutical researchcomputer application in pharmaceutical research
computer application in pharmaceutical researchSUJITHA MARY
 
Research methodology for business .pptx
Research methodology for business .pptxResearch methodology for business .pptx
Research methodology for business .pptxParmeshwar Biradar
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodologysmumbahelp
 
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010TEST Huddle
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodologysmumbahelp
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.pptmanaswidebbarma1
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptxBillyMoses1
 
Scaling in research
Scaling  in researchScaling  in research
Scaling in researchankitsengar
 
Measurement and scaling techniques
Measurement  and  scaling  techniquesMeasurement  and  scaling  techniques
Measurement and scaling techniquesUjjwal 'Shanu'
 
Chotu scaling techniques
Chotu scaling techniquesChotu scaling techniques
Chotu scaling techniquesPruseth Abhisek
 
Measurement and scaling techniques
Measurement and scaling techniquesMeasurement and scaling techniques
Measurement and scaling techniquesSarfaraz Ahmad
 
Test design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTARTest design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTARRik Marselis
 
Multi variate presentation
Multi variate presentationMulti variate presentation
Multi variate presentationArun Kumar
 
Measurement and scaling techniques
Measurement and scaling techniquesMeasurement and scaling techniques
Measurement and scaling techniquesKritika Jain
 
1_Design and Analysis of Experiment_Data Science.pptx
1_Design and Analysis of Experiment_Data Science.pptx1_Design and Analysis of Experiment_Data Science.pptx
1_Design and Analysis of Experiment_Data Science.pptxdeepak667128
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 

Similar to Psychometric Studies in the Development of an Inkjet Printer (20)

computer application in pharmaceutical research
computer application in pharmaceutical researchcomputer application in pharmaceutical research
computer application in pharmaceutical research
 
Research methodology for business .pptx
Research methodology for business .pptxResearch methodology for business .pptx
Research methodology for business .pptx
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
 
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
 
Measurement and scaling
Measurement and scalingMeasurement and scaling
Measurement and scaling
 
Measurement and scaling
Measurement and scalingMeasurement and scaling
Measurement and scaling
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptx
 
Scaling in research
Scaling  in researchScaling  in research
Scaling in research
 
Measurement and scaling techniques
Measurement  and  scaling  techniquesMeasurement  and  scaling  techniques
Measurement and scaling techniques
 
Chotu scaling techniques
Chotu scaling techniquesChotu scaling techniques
Chotu scaling techniques
 
Measurement and scaling techniques
Measurement and scaling techniquesMeasurement and scaling techniques
Measurement and scaling techniques
 
Test design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTARTest design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTAR
 
Multi variate presentation
Multi variate presentationMulti variate presentation
Multi variate presentation
 
Measurement and scaling techniques
Measurement and scaling techniquesMeasurement and scaling techniques
Measurement and scaling techniques
 
1_Design and Analysis of Experiment_Data Science.pptx
1_Design and Analysis of Experiment_Data Science.pptx1_Design and Analysis of Experiment_Data Science.pptx
1_Design and Analysis of Experiment_Data Science.pptx
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
ch 13.pptx
ch 13.pptxch 13.pptx
ch 13.pptx
 

Recently uploaded

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 

Recently uploaded (20)

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 

Psychometric Studies in the Development of an Inkjet Printer

  • 1. Psychometric Studies in the Development of the Kodak AiO Inkjet Printer David Lee Inkjet Systems Division Eastman Kodak Company
  • 2. 2 The Kodak Value Proposition … leading to three key vectors of differentiation “Implementing Kodak’s Digital Imaging System Strategy” IMAGE QUALITY ONE BUTTON SIMPLE & FAST Real Kodak Pictures R E A LLY E A S Y REAL VALUE Look, feel, durability, and fade- resistance of Kodak pictures. Affordable pictures Simple, convenient photos and text, with and without a PC Kodak Quality
  • 3. 3 Agenda Introduction / Overview • What is psychometrics? • Thresholds, Just-Noticeable Differences and the Psychometric Function • Why do we need it? • Class example Experimental Considerations • Unbiased • Environmental considerations (i.e., Lighting) • Issues & Challenges – Number of samples Technical Aspects with Examples • Measurement/Numerical Scales • Selected scaling methodologies and data analysis – Direct versus indirect methods – Hybrid DASC Method Summary/Acknowledgements
  • 4. 4 What is Psychometrics? What is psychophysics/psychometrics? I’m not sure!  •The branch of psychology concerned with quantitative relations between physical stimuli and their psychological effects • Scientific approach to studying relationships between human perception (psychology) and measured physical characteristics human perception physical measurements
  • 5. 5 “What is the value of the characteristic that is just visible or just detectable?” “What is the minimum value of the characteristic that is seen as different from a reference or standard?” For example, if we present a stimulus varying in intensity, the observer responds with ‘Yes’ or ‘No’ indicating they see/feel/observe the difference. •Estimate the probability or likelihood of ‘successful’ outcome • Think of shining a light in a dark room starting at a very low intensity • The voltage is slowly increased until the observer notices a difference • Some people will be more sensitive and notice a difference earlier while others won’t see a difference until much more power Thresholds and Just-Noticeable Differences (jnd) Two Key Questions…
  • 6. 6 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Proportion 5 6 7 8 9 10 11 12 13 Volts Psychometric Function psychology physics Stimulus or Absolute Threshold Just Noticeable Difference
  • 7. 7 Why do we need psychophysics? Ties the Voice of the Customer (VOC) to… Marketing: •Benchmarking • Where do we stand wrt to competitors? •Program Goals • What’s the ‘best’ level of _______ Product Design & Development: •Design Trade-offs: • Print quality versus print speed • Tonescale (Color Gamut) versus ink laydown •Troubleshooting • How noticeable is an artifact? •Monitor Development Progress • Where we are with things? • Has the perception of a characteristic changed over time?
  • 8. 8 Let’s try an example or two…
  • 9. 9 Remove the two (2) reference anchors from the envelope labeled “anchors” and place the darker one to your left and the lighter one to your right. As marked towards the bottoms, these anchors represent lightness levels of zero (0) and 100, respectively. Next, remove the seven (7) sample patches from the envelope labeled “numerical rating samples” and place them face-down to your side. Your task is to compare the lightness of each sample to the two anchors, one sample at a time, and select a number between zero (0) and 100 that best represents it’s lightness level. After you make each of your decisions, place the sample to the side, face down, and record the lightness value you chose next to the matching letter ID on the sample score-sheet. Any questions? low anchor = 0 high anchor = 100 Experimental Instructions for Two-Anchor Numerical Rating Lightness Experiment test sample (evaluated one at a time) Ref.: samples J, E, L, S, C, Y, D
  • 10. 10 Experimental Instructions for Two-Anchor Numerical Rating Lightness Experiment (Continued) Double Anchor Numerical Rating Score Sheet Sample ID Score J E L S C Y D
  • 11. 11 Discussion from Double-Anchor Numerical Rating (Lightness) Experiment Lay your samples out in front of you based on the lightness scores that you assigned to each sample. Are your numerical scores generally consistent over the wide range of lightness? Can you use your scores to reliably detect small differences between samples? Are all of your scores consistent with relative lightnesses when you directly compare samples (e.g. samples ‘Y’ and ‘D’)? Would you consider this type of methodology better for: A) wide range scaling at lower resolutions? B) high resolution of small differences between samples? Do you believe that this type of method would be practical for relatively large numbers of samples (e.g. handling, difficulty of evaluation, etc.)? 0 10 20 30 40 50 60 70 80 90 100
  • 12. 12 Numerical Ratings: Advantages/Limitations The task of numerical rating is relatively simple for observers This method is compatible with physical or imaginary anchors. •Double anchor of zero to 100 The numerical rating method is effective for lower accuracy/precision characterizations over wide ranges of perception. It is also compatible with larger numbers of samples. The numerical rating method is generally not effective for detecting small differences between samples. This is because of our inherent limitations in estimating absolute magnitudes over wide ranges (Engeldrum). Data are approximately linear (interval or ratio scales)
  • 13. 13 Experimental Instructions for Rank Order Lightness Experiment Please remove the six (6) samples (B, O, R, H, A & M) from the envelope marked “rank order samples”. Comparing all six (6) samples to each other, your task is to arrange them in order of increasing lightness with the darkest sample towards your left and the lightest towards your right. Once you are satisfied with your rank ordering of lightness, record the order of your samples on the rank order score-sheet that is provided and position your samples in that order on the sheet. Any questions? darkest lightest
  • 14. 14 Rank Order Score Sheet Darkest Lightest
  • 15. 15 Discussion from Rank Order Lightness Experiment Do you believe that this is an effective method for detecting small lightness differences between samples? Do you believe that this would be an effective method for scaling samples over wide ranges? Could this method be used with a large number of samples? What do you believe are the strengths and weakness compared to the numerical rating method? What would you conclude about the relative lightness between two given samples where 50% of the observers ranked sample A lighter than B and 50% ranked sample B lighter than A? If 75% of observers ranked sample A lighter than B and 90% of observers ranked sample B lighter than C, would you conclude that there is a larger lightness difference between A- B or B-C? CA B A B
  • 16. 16 Rank Order Method: Advantages/Limitations The rank order “confusion” method is very effective for detecting small differences between samples. This is based on our innate competencies in making close comparative judgments. It is also effective for evaluating multi- dimensional trade-offs such as overall IQ. There are practical limits to numbers of samples (e.g. 8, 10, 15?) Rank ordering tends to lose its effectiveness over relatively wide perceptual ranges. Rank order methods afford direct assess to “preference”, “acceptability”, and “noticeability” data. We cannot assume that raw rank order data are linear; they are ordinal… We can leverage “observer confusion” to derive interval scales using Thurstone’s “law of comparative judgments”
  • 17. 17 Relative Lightnesses of Sample Patches ID Relative Lightness RGB Black 0 0 J 5 13 E 20 51 L 45 115 S 58 148 C 68 173 O 80 204 R 81 207 H 82 209 A 83 212 M 84 214 Y 84 214 D 85 217 B 94 240 White 100 255
  • 18. 18 Experimental Considerations •In addition to the issues of conducting any designed experiment you must be aware of the following: • Follow the same protocol/methodology for each observer – Keep the test unbiased! • Issues & Challenges – Range and distribution of the characteristic » Number of samples versus objective – Image content » 1st party versus 3rd party images » Make sure the content spans the range of interest – Number & type of observers » Expert versus average users – Sample preparation, handling and maintenance » Sample borders
  • 19. 19 Environmental Considerations Presenting & viewing the samples • Presentation Mode – All-at-once versus one-at-a-time – Viewing Distance – Environmental factors » Make sure the observer is comfortable • Scene lighting tungsten fluorescentdaylight
  • 20. 20 Classification of Scale Types Scale Principle/ Definition Example Arithmetic Operations Statistical Operations Nominal Identity Classes Male vs. Female Counting χ2 Analysis Binomial Ordinal Order Order of finish in a race Greater than or less than operations. median, IRQ, Nonparametric Interval Ordering is known wrt to an attribute Temp (°F) Most personality measures. Addition and subtraction of scale values. Mean ANOVA Multivariate Ratio There is a rationale zero point for the scale. Temp (°K) Distance Multiplication and division of scale values. Geometric Mean ANOVA
  • 21. 21 Direct versus Indirect Scaling Direct Methods Judgments of the magnitude are made in direct scaling through directly assigning numbers that correspond to the level of the stimuli •Judgments are often made relative to a reference or anchor Numeric estimation provides a very simple and easy method toward constructing interval scale estimates •Reliability of the estimate Methods include • Two-Anchor Grayscale Example • Line or Ruler Based Systems • Comrey Constant Sum Paired Comparison • Magnitude Estimation Observers are asked to assign a value to each stimulus. Each stimulus is observed only once and one-at-a-time •Tendency for a ‘learning curve’ effect
  • 22. 22 Stevens’ Power Law Where: Ψ=subjective magnitude, one’s impression of of the stimulus magnitude R=the actual, but often unknown, magnitude of the response S=physical magnitude of the stimulus b=characteristic growth exponent k=constant of proportionality/scaling constant b kSR ==Ψ )log()log(*)log( kSbR +=
  • 23. 23 Exponents of the Power Law  Exponents are interpreted as a ‘signature’ characterizing the impression of that specific sensation via numeric estimation • Reported correlations on the order of 99%  Critics argued that solely relying on numeric estimates made it impossible to: • Verify the Power Law • Confirm the characteristic exponent • Prove superiority of magnitude scaling techniques
  • 24. 24 Cross Modality Matching Assumptions •IF the coefficients in the previous table truly do represent the characteristic exponent, then any of these stimuli could be used in lieu of numeric estimation •Results support the characteristic exponents shown in the earlier table •Responses other than numbers can be successfully utilized
  • 25. 25 Here’s Where the Rubber Hits the Road… Any two quantitative response measures with established characteristic exponents can be used to judge sensory stimuli (i.e. Text Quality) ‘Validity’ is determined by comparing the theoretical to empirical ratios between the two response measures Yields ratio-scale estimates of the sensory stimuli suitable for statistical analyses (i.e. ANOVA & Hypothesis Testing) S ψ R1 R2 )log()log(*)log( 111 kSbR += )log()log(*)log( 222 kSbR +=
  • 26. 26 Text Quality Example Finally… Problem: •Define the minimum acceptable text quality for draft mode printing •Quantify the effect of ‘Factor A’ and ‘Factor B’ on perceived Text Quality •Understand trade-offs • Map out response as a function of input factors Protocol: •Test Samples were generated by simulation •Factor settings data taken from competitive assessments •Factors A & B were the only settings changed between simulations •10x Scaling – Subjects were seated 5’ from samples for a 6-10” equivalent viewing distance.
  • 27. 27 General Procedure Calibration Phase •Subjects are given instruction and practice in the use of two quantitative response methods •Allows for determining if subjects judgments are proportional and to what degree • Can be used to estimate the effects of bias and provide correction Scaling Phase: •Repeat same procedure only now with stimuli of interest
  • 29. 29 Bias Correction / Results of the Calibration Phase For numerical estimation of a specified line length Similar results seen with line length estimation Can be used to test if individuals are utilizing a proportional judgments that track across the continuum y = 0.8706x + 0.2605 R2 = 0.9956 0 0.5 1 1.5 2 2.5 0.0000 0.5000 1.0000 1.5000 2.0000 2.5000 Log(Actual Line Length) Log(NumericEstimate) Linear (Theoretical) Linear (Calibration)
  • 30. 30 Calculation of Ratio Scale Values Under ideal conditions, NE1.0 =LP1.0 To compute magnitude scale values, ψ=(NE1.0 *LP1.0 )0.5 To correct for bias, ψ=(NE1/n *LP1/n )0.5 where n=exponent as determined from the calibration exercise
  • 31. 31 Comparison of Text Quality Responses After Bias Correction Yellow lines represent minimum acceptable Text quality as determined by the observers Corrected Response Data from 20 Samples y = 0.933x + 0.1421 R 2 = 0.8442 1.2 1.4 1.6 1.8 2.0 2.2 1.2 1.4 1.6 1.8 2.0 2.2 Corrected[Log(Numerical Estimate)] Corrected[Log(ProducedLineLength)] Theoretical Fit Corrected Response Linear (Corrected Response)
  • 32. 32 Variability Chart for the Composite Metric The strong effect of Factor A is clearly seen The variability of the replicates was very close CompositeMetric 1.5 1.6 1.7 1.8 1.9 2.0 2.1 -1.00 -0.33 0.33 1.00 -1.00 -0.33 0.33 1.00 -1.00 -0.33 0.33 1.00 -1.00 -0.33 0.33 1.00 -1.00 -0.33 0.33 1.00 Factor B w ithin Factor A
  • 33. 33 ANOVA Results 1.5 1.6 1.7 1.8 1.9 2 2.1 Composite Metric 2.043306 ±0.036828 00.250.50.751 Desirability 0.890101 -1.0 -0.5 0.0 0.5 1.0 1 Factor A -1.0 -0.5 0.0 0.5 1.0 0.16757 Factor B 0 0.25 0.5 0.75 1 Desirability Summary of Fit RSquare 0.970998 RSquare Adj 0.96064 Root Mean Square Error 0.033764 Mean of Response 1.831543 Observations (or Sum Wgts) 20
  • 34. 34 Comparison of Text Quality Responses After Bias Correction In addition to understanding the factors, the minimum acceptable levels for Factors A & B are also understood! Corrected Response Data from 20 Samples y = 0.933x + 0.1421 R 2 = 0.8442 1.2 1.4 1.6 1.8 2.0 2.2 1.2 1.4 1.6 1.8 2.0 2.2 Corrected[Log(Numerical Estimate)] Corrected[Log(ProducedLineLength)] Theoretical Fit Corrected Response Linear (Corrected Response)
  • 35. 35 Direct Scaling Summary Able to yield ratio scale numbers for subsequent analyses Considerable effort Magnitude estimation required participant training as well as calibration steps Some participant fatigue was evident Some participants were not following proportional judgments
  • 36. 36 Direct versus Indirect Scaling Indirect Methods Methods used to create a psychological scale in which information regarding an individuals ability to detect or discriminate stimuli is used to infer the magnitude of a sensation Typically easier to conduct and communicate to participants • Magnitude Estimation required training the participants as well as conducting a calibration step •Data analysis requires additional theory for interval scale generation • Thurstone’s Law of Comparative Judgment Methods include: •Rank Ordering •Successive Categories •Category Scaling •Paired Comparison
  • 37. 37 Rank Ordering and Successive Categories Rank Ordering Quick and easy but loose information and non-uniformity of distance between ranks Successive Categories Difficult to keep equal distances between categories Information may be lost due to limited resolution of the categories
  • 38. 38 Thurstone’s Law of Comparative Judgment Each stimulus is perceived through a ‘discriminal process’ that has some value An observer may vary his or her response for the same stimulus when viewed multiple time •Resulting in a ‘confusion’ level By presenting the stimulus multiple times to either the same observer and/or over different observers, a frequency distribution can be determined over the psychological continuum •Psychometric function Thurstone assumed that the distribution followed a Gaussian distribution with a specific mean and standard deviation, which he called scale value and discriminal spread •Thurstone describes 6 cases with various assumptions •Case V assumes no correlation between the stimuli and constant variance across the different stimuli
  • 39. 39 Thurstone’s Law of Comparative Judgment Case 5 14121086420 0.4 0.3 0.2 0.1 0.0 Psychological Response Cont inuum FrequencyofResponse 3 7 11 Mean Example of 3 Stimuli
  • 40. 40 Example of Case 5 Applied to a Category Rating Experiment Suppose we run a Categorical Rating experiment with the following 5 categories Assume that we had 7 samples and 15 participants very bad bad niether bad nor good good very good Category very bad bad niether bad nor good good very good sample 1 2 1 8 3 1 sample 2 1 1 4 8 1 sample 3 6 1 6 1 1 sample 4 2 1 4 5 3 sample 5 6 2 4 1 2 sample 6 4 1 7 1 2 sample 7 1 1 1 3 9 Category very bad bad niether bad nor good good very good sample 1 2 3 11 14 15 sample 2 1 2 6 14 15 sample 3 6 7 13 14 15 sample 4 2 3 7 12 15 sample 5 6 8 12 13 15 sample 6 4 5 12 13 15 sample 7 1 2 3 6 15 Convert scoring matrix to a cumulative count matrix
  • 41. 41 Example of Case 5 (Continued) Convert to a cumulative proportion matrix Convert to a z-score, calculate means and standard deviations with some additional scaling Perform similar operations for the ratings (Very Bad … Very Good) to determine boundaries Category very bad bad niether bad nor good good very good sample 1 0.133 0.200 0.733 0.933 1.000 sample 2 0.067 0.133 0.400 0.933 1.000 sample 3 0.400 0.467 0.867 0.933 1.000 sample 4 0.133 0.200 0.467 0.800 1.000 sample 5 0.400 0.533 0.800 0.867 1.000 sample 6 0.267 0.333 0.800 0.867 1.000 sample 7 0.067 0.133 0.200 0.400 1.000 z-score matrix: mean std-dev sample 1 -1.130 -0.840 0.610 1.480 0.030 1.230 sample 2 -1.560 -1.130 -0.250 1.480 -0.365 1.345 sample 3 -0.250 -0.100 1.080 1.480 0.553 0.858 sample 4 -1.130 -0.840 -0.100 0.840 -0.308 0.879 sample 5 -0.250 0.080 0.840 1.080 0.438 0.626 sample 6 -0.640 -0.440 0.840 1.080 0.210 0.875 sample 7 -1.560 -1.130 -0.840 -0.250 -0.945 0.550
  • 42. 42 Case 5 (Continued) Interval scales estimates of the psychological response are calculated Boundary limits are also established for the Rating levels This method is very useful for competitive assessments as well as tracking progress over time Relative Sample Scores -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample ID relativescalevalue Upper Category Boundaries -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 verybad bad nietherbadnorgood good category upperboundary
  • 43. 43 Indirect Scaling Summary Rank ordering and successive categories are easy tests to administer •Participants don't require special training • Other than the basic instructions for performing the task at ahnd •Provide limited information Thurstone's Law of Comparative (Categorical) Judgment provides a means to determine a psychological scale from caparison (category) judgments • Requires assumptions about the distribution of both the psychological response as well as the category boundaries • Yields interval scale numbers Paired comparisons (PC) can utilized when the number of stimuli is fairly low and a fine level of distinction is necessary •PC can be extended to yield ratio scale numbers -> Comrey Constant Sum PC •Other Methods…
  • 44. 44 DASC Method DASC: Double Anchor Successive Categories Method Useful when you have: • Lots of samples (up to ≈ 40) • Need for an interval number for ANOVA analysis (DOE’s, etc.) •Combination of Successive Categories and Magnitude Estimation Method has three steps: • Step 1: Organize samples into piles • Step 2: Score each pile • Step 3: Define score where acceptability begins
  • 45. 45 Step 1: Organize the Piles Step 1: Organize samples into piles, with worst on left, best on right • How many piles? Driven by # samples to be tested. • 3 piles x 3 samples/pile ≈ 9 samples max • 4 piles x 4 samples/pile ≈ 16 samples max • 5 piles x 5 samples/pile ≈ 25 samples max, etc. • Only requirement is to end up with correct # piles sorted left to right • Replicates are always included to get measure of experimental error • Using piles is just a quick (and easier) way to score many samples
  • 46. 46 Step 2: Score Each Pile Step 2: Score each pile • Double Anchors • Score = 1: Worst possible _____________ • Score = 100: Best possible _____________ • If piles are similar to each other, scores should be similar • If piles not similar, scores should reflect that • Scores do not have to fill entire 1-100 score range • Scores do not have to be evenly spaced • It is unlikely that “real” samples will hit either anchor; if so, clarify instructions • Every sample within a pile inherits that pile’s score
  • 47. 47 Step 3: Get a Score for Overall Acceptability Step 3: Get Score for Acceptability • Starting at Score = 1 and moving to the right, find the hypothetical Score where acceptability goes from unacceptable to acceptable for each observer • Score does not have to equal a pile score • Value is used to decide whether each sample in the test is unacceptable or acceptable • Example: If Score for Acceptability = 65, then: • Piles < 65 are unacceptable • Piles ≥ 65 are acceptable
  • 48. 48 Sample Scoring via EXCEL Template •Tester input in yellow boxes • Observer name entered • Piles are scored and entered under Magnitude • Acceptability Cutoff = 81 • Every sample under left most pile (80) gets a “1” • Samples 7, 13, 15, 17, 19, 21, 23 • Etc. • Number of samples recorded is calculated both down and over (see “Checks”) •Data ready for analysis SW Category: 1 2 3 4 5 Observer Magnitude: 80 89 92 95 100 Michelle Total Found 35 7 11 8 2 7 Checks Category: 1 2 3 4 5 1 1 1 2 1 1 3 1 1 4 1 1 5 1 1 6 1 1 7 1 1 8 1 1 Acceptability Cutoff 81 9 1 1 10 1 1 11 1 1 12 1 1 13 1 1 14 1 1 15 1 1 16 1 1 17 1 1 18 1 1 19 1 1 20 1 1 21 1 1 22 1 1 23 1 1 24 1 1 25 1 1 26 1 1 27 1 1 28 1 1 29 1 1 30 1 1 31 1 1 32 1 1 33 1 1 34 1 1 35 1 1 NeedtofindONECategoryperSample. SampleIDSampleID Proctor's Notes: 1) Enter the Observer's ID above. 2) Pick up the samples from Category 1. Enter the number 1 (one) at the grid location defined by the row (Sample Number) and column (Category) in which the sample is found. 3) Repeat step 2 with samples from each of the successive categories. 4) Have the Observer assign a Magnitude (relating to "goodness") for each Category. Magnitude must be between 1 & 100. Total Found must equal the number of Samples. The value in the row of "Checks" must equal the number of samples found in each Category. All "Check Cells" are formatted in Bold Red when the data are not correct.
  • 49. 49 Data Ready for Analysis Software •First compute mean scores from raw data • Run ANOVA model – fixed by DOE design • Linear, response surface, D-Optimal, whatever • Is model a decent one? • R2 & R2 (adj) – how much of total variation explained by model • Go for parsimony – no insignificant factors included • RMSE reasonable? • If model decent, save prediction formula • Compare Predicted vs. Actual • Make significance tests • Orthogonal contrasts/comparisons
  • 50. 50 ANOVA Analysis •Summary of fit, whole model ANOVA, Effect tests, etc. 0 25 50 75 100 Mean(Score) 0 25 50 75 100 P (Mean Score) Bivariate Normal Ellipse P=0.950 Linear Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.926 0.915 3.864 51.254 24.000 Summary of Fit Model Error C. Total Source 3 20 23 DF 3726.3512 298.6083 4024.9596 Sum of Squares 1242.12 14.93 Mean Square 83.1937 F Ratio <.0001* Prob > F Analysis of Variance mode image Source 2 1 Nparm 2 1 DF 2802.5508 923.8004 Sum of Squares 93.8537 61.8737 F Ratio <.0001* <.0001* Prob > F Effect Tests
  • 51. 51 Analysis: Logistic Regression •Raw score data used as regressor (X value) • Binary yes/no used as response (Y value) • Result, if decent, is a probability prediction • When score = xx, probability of acceptance = yy • Essentially, this is a calibration of the score scale • Area under blue line = prob. of NOT being acceptable • Area above blue line = prob. of being acceptable Acceptable 0.00 0.25 0.50 0.75 1.00 0 25 50 75 100 Score no yes
  • 52. 52 Variance Components Study using DASC Methodology Photo Media Artifact There was an unwanted artifact when printing on photo media Popular opinion dictated that it was a media related problem Set up a study where 4 medias were printed across 3 randomly selected printers with a variety of image content Magnitude 20 30 40 50 60 70 80 90 B C M B C M B C M B C M Do Pr Sp St Printer w ithin Media %ofTotal 40% 7% 42% 12% Component Component Media Media*Printer Printer Within
  • 53. 53 Logistic Regression of Photo Media Artifact Study Point of Subjective Equality ~72 on the Magnitude scale JND (as determined by the 75th minus the 50th percentiles) ~5 units Printer-to-printer differences could be as extreme as 40 Magnitude units
  • 54. 54 Comparison of Psychometric Methods Criteria Rank Order Category Rating Paired Compr. Rating Scales Magnitude Estimation Comrey Constant Sum PC DASC Indirect / Direct Indirect Indirect Indirect Direct Direct Direct Hybrid # Observers Mod High Low High/Mod Mod Low Mod/High # Presents n n (n(n-1))/2 n n (n(n-1))/2 n Range of Stimuli Mod Wide Small Wide Mod Small Small/Mod Numerical Scale Nominal / Interval Interval Interval Interval Ratio Ratio Interval Ability to Discriminate High Mod High Mod Mod High Mod Scale Variability Mod High Mod High High Mod/High Mod Training Reqd. No No No No Yes Yes No Test Duration Short Short Mod Short Mod/High Mod Mod
  • 55. 55 Acknowledgements Kodak Contributors: Dana Aultman* Randy Dumas Wayne Richard Steve Billow References: Class notes from Empirical Modeling (CQAS 0875), Barker, Thomas Engeldrum, Peter G. (2000), Psychometric Scaling, a Toolkit for Imaging Systems Development, Imcotek Press Miller,M , IST: Psychophysics, Eastman Kodak Internal Reference Lodge, Milton (1981), Magnitude Scaling: Quantitative Measurement of Opinions, Sage Publications *Retiree as of 12/07
  • 56. 56 Summary High Level Overview of Psychometric Theory Experimental Conditions Review of Different Methods (Including Trade-Offs) •Direct •Indirect •Hybrid Reviewed Some Practical Examples

Editor's Notes

  1. Trying to apply scientific approach to studying relationships between human perception and measured physical characteristics
  2. This leads to the development of the psychometric function Sometimes called the frequency-of-seeing curve Cumulative density function Visual evaluations are not always compared to physical measurements Can be ‘feel’ or photo-like
  3. Taken from my Empirical Modeling Notes from RIT