SlideShare ist ein Scribd-Unternehmen logo
1 von 87
Downloaden Sie, um offline zu lesen
Development of health
measurement scales – part II

Dr. Rizwan S A, M.D.
If you cannot express in numbers something that you are describing, you probably
have little knowledge about it.
1
Scaling responses
• Categorical
• Continuous
– Direct Estimation Method
• Visual Analogue Scale
• Adjectival Scale
– Discrete
– Continuous

• Specific Scaling
– Likert Scale
– Semantic Scale

– Comparative Method
• Thurstone’s Method
• Paired Comparision Method
• Guttmann Method

– Econometric Method
2
Outline
•
•
•
•
•

Reliability
Validity
Measuring change
Conclusions
Article discussion

3
Variance = sum of (individual value – mean value)2
----------------------------------------------------------------------------------

no. of values

4
Reliability
• Whether our tool is measuring the attribute in a
reproducible fashion or not
• A way to show the amount of error (random and
systematic) in any measurement
• Sources of error – observers, instruments, instability
of the attribute
• Day to day encounters
– Weighing machine, watch, thermometer

5
Assessing Reliability
• Internal Consistency
– The average correlation among all the items in the tool
• Item-total correlation
• Split half reliability
• Kuder-Richardson 20 & Cronbach‘s alpha
• Multifactor inventories

• Stability
– Reproducibility of a measure on different occasions
• Inter-Observer reliability
• Test-Retest reliability (Intra-Observer reliability)
6
Internal consistency
• All items in a scale tap different aspects of the same
attribute and not different traits
• Items should be moderately corr. with each other and
each item with the total
• Two schools of thought
– If the aim is to describe a trait/behaviour/disorder
– If the aim is to discriminate people with the trait from those
without

• The trend is towards scales that are more internally
consistent
• IC doesn‘t apply to multidimensional scales

7
Item-total correlation
• Oldest, still used
• Correlation of each item with the total score w/o that
item
• For k number of items, we have to calculate k number
of correlations, labourious
• Item should be discarded if r < 0.20
• Best is Pearson‘s R, in case of dichotomous items point-biserial correlation

8
Split half reliability
• Divide the items into two halves and calculate corr.
between them
• Underestimates the true reliability because we are
reducing the length of scale to half (r is directly
related to the no. of items)
– Corrected by Spearman-Brown formula

• Should not be used in
– Highly timed achievement tests
– Chained items
10
KR 20/Cronbach‘s alfa
• KR-20 for dichotomous responses
• Cronbach‘s alfa for more than two responses
• They give the average of all possible split half reliabilities of a
scale
• If removing an item increases the coeff. it should be discarded
• Problems
– Depends on the no. of items
– A scale with two different sub-scales will prob. yield high alfa
– Very high alfa denotes redundancy (asking the same question in
slightly different ways)
– Thus alfa should be more than 0.70 but not more than 0.90
11
• Cronbach‘s basic equation for alpha
n
n

1
1

Vi
Vtest

– n = number of questions
– Vi = variance of scores on each question
– Vtest = total variance of overall scores on the
entire test
13
Calculation of Cronbach‘s coefficient alpha
Example: Assessment of emotional health

During the past month:
Have you been a very nervous person?

Yes No
1
0

Have you felt downhearted and blue?

1

0

Have you felt so down in the dumps that
nothing could cheer you up?

1

0

14
Results
Patient

Item 1

Item 2

Item 3

Summed
scale score

1

0

1

1

2

2

1

1

1

3

3

0

0

0

0

4

1

1

1

3

5

1

1

0

2

Percentage
positive

mean score
3/5=.6 4/5=.8 3/5=.6 = 2

15
Calculations
Mean score = 2

Sample variance =

CC alpha

1

(2

2)

2

(3

2)

2

(0
(5

1

(. 8 )(. 2 )
1 .5

(. 6 )(. 4 )

(3

2)

2

(2

2)

2

1 .5

k
k

3

2

1)

(% pos ) i (% neg ) i
Var

(. 6 )(. 4 )

2)

1

0 . 86

2

Conclude that this scale has good reliability
16
Multifactor inventories
• More sophisticated techniques
• Item-total procedure – each item should correlate
with the total of its scale and the total of all the scales
• Factor analysis
– Determining the underlying factors
– For eg., if there are five tests
• Vocabulary, fluency, phonetics, reasoning and arithmetic
• We can theorize that the first three would be correlated
under a factor called ‗verbal factor‘ and the last two
under ‗logic factor‘
20
Stability/ Measuring error
• A weighing machine shows weight in the range of
say 40-80 kg and thus an error of 1kg is
meaningful
• A ratio will be more useful,
measurement error / total variability between subjects

But in reality we calculate the ratio
variability between subjects / total variability
(Total variability includes subjects and measurement error)

• So that a ratio of
– 1 indicates no measurement error/perfect reliability
– 0 indicates otherwise
21
• Reliability =
subj. variability / (subj. variability + measurement error)
• Statistically ‗variance‘ is the measure of variability so,
• Reliability =
SD2 of subjects / (SD2 of subjects + SD2 of error)
• Thus reliability is the proportion of the total variance that
is due to the ‗true‘ differences between the subjects

• Reliability has meaning only when applied to specific
populations
22
1

• Measurement error/
total variability between subjects

2

• Variability between subjects/
total variability

3

• Subj. variability/
(subj. variability + measurement error)

4

• SD2 of subjects/
(SD2 of subjects + SD2 of error)

23
Calculation of reliability
• The statistical technique used is ANOVA and
since we have repeated measurements in
reliability, the method is
– repeated measures ANOVA

24
25
Example

26
27
• Classical definition of reliability
• Interpretation is that 88% of the variance is
due to the true variance among patients (aka
Intraclass Correlation Coefficient, ICC)
28
Fixed/random factor
• What happened to the variance due to observers?
• Are these the same observers going to be used or they
are a random sample?

• Other situations where observations may be treated as
fixed is subjects answering ‗same items on a scale‘
29
Other types of reliability
• We have only examined the effect of different
observers on the same behaviour
• But there can be error due to ‗day to day‘ differences,
if we measure the same behaviour a week or two
apart we can calculate ‗intra-observer reliability
coefficient‘
• If there are no observers (self-rated tests) we can still
calculate ‗test-retest reliability‘

31
• Usually high inter-observer is sufficient, but if it is
low then we may have to calculate intra-observer
reliability to determine the source of unreliability
• Mostly measures of internal consistency are reported
as ‗reliability‘, because there are easily computed in a
single sitting
– Hence caution is required as they may not measure
variability due to day to day differences

32
Diff. forms of reliability coefficient
• So far we have seen forms of ICC
• Others
– Pearson product-moment correlation
– Cohen‘s kappa
– Bland – altman analysis

33
Pearson‘s correlation
• Based on regression – the extent to which the relation
between two variables can be described by straight
line

34
Limitations of Pearson‘s R
• A perfect fit of 1.0 may be obtained even if the intercept
is non-zero and the slope is not equal to one unlike with
ICC
• So, Pearson‘s R will be higher than truth, but in practice it
is usually equal to ICC as the predominant source of error
is random variation
• If there are multiple observations then multiple pairwise
Rs are required, unlike the single ICC
• For eg. with 10 observers there will be 45 Pearson‘s Rs
whereas only one ICC
35
Kappa coeff.

36
0.70-0.41/1-0.41 = 0.491

37
38
• Used when responses are dichotomous/categorical
• When the frequency of positive results is very low or high,
kappa will be very high
• Weighted kappa focuses on disagreement, cells are weighted
according to the distance from the diagonal of agreement

• Weighting can be arbitrary or using quadratic weights (based
on square of the amount of discrepancy)
• Quadratic scheme of weighted kappa is equivalent to ICC
• Also, the unweighted kappa is equal to ICC based on ANOVA
39
Bland and Altman method
• A plot of difference between two observations
against the mean of the two observations

40
41
• The mean diff. is related to observer variance in ICC, and the SD of
differences if related to the error variance in ICC
• Limits of agreement are calculated as mean difference
the error variance)

2 SE (= to

• Agreement is expressed as the ‘limits of agreement’. The
presentation of the 95% limits of agreement is for visual judgement
of how well two methods of measurement agree. The smaller the
range between these two limits the better the agreement is.
• The question of how small is small depends on the clinical context:
would a difference between measurement methods as extreme as
that described by the 95% limits of agreement meaningfully affect
the interpretation of the results
• Limitation - the onus is placed on the reader to juxtapose the
calculated error against some implicit notion of true variability
42
Issues in Interpretation
SE of measurement and reliability
• R is a dimension-less ratio of variances and so it is difficult to
interpret R in terms of an individual score
• SEM = σ sqrt(1-R)
• If we knew the true score of someone, we can estimate the
limits within which 68% or 95% of the times the observed
value would lie
• Eg. A scale with SD 10 and R 0.8. If the true score was 15, we
can say 68% of the time his observed value will fall between
10.5 to 19.5

43
Standards for magnitude of reliability coeff.
• How
much
reliability
is
good?
Kelly (0.94) Stewart (0.85)
• A test for individual judgment should be higher
than that for research in groups
• For Research purposes –
– Mean score and the sample size will reduce the error
– Conclusions are usually made after a series of studies
– Acceptable reliability is dependent on the sample size
in research
45
Reliability and probability of misclassification
• Depends on the property of the instrument and the
decision of cut point
• Relation between reliability and likelihood of
misclassification
– Eg. A sample of 100, one person ranked 25th and another
50th
– If the R is 0, 50% chance that the two will reverse order on
retesting
– If R is 0.5, 37% chance, with R=0.8, 2.2% chance

• Hence R of 0.75 is minimum requirement for a useful
instrument
46
Improving reliability
• Increase the subject variance relative to the error
variance (by legitimate means and otherwise)
• Reducing error variance
– Observer/rater training
– Removing consistently extreme observers
– Designing better scales

• Increasing true variance
– In case of ‗floor‘ or ‗ceiling‘ effect, introduce items that
will bring the performance to the middle of the scale (thus
increasing true variance)
• Eg. Fair-good-very good-excellent (instead of bad-good)
47
• Ways that are not legitimate
– Test the scale in a heterogeneous population
(normal and bedridden arthritics)
– A scale developed in homogeneous population will
have a larger reliability when used in a
heterogeneous population
• correct for attenuation

48
• Simplest way to increase R is to increase the no. of
items
• True variance increases as the square of items
whereas error variance increases only as the no. of
items
• If the length of the test is triples
– Then Rspearman brown = 3R/ 1 + 2R

49
• In reality the equation overestimates the new
reliability
• We can also use this equation to determine the
length of a test for achieving a pre-decided
reliability
• To improve test-retest reliability – shorten the
interval between the tests
• An ideal approach is the examine all the sources
of variation and try to reduce the larger ones
(generalizability theory)
50
Sample size for reliability studies

51
Summary for Reliability
• Pearson R is theoretically incorrect but in
practice fairly close
• Bland and Altman method is analogous to
error variance of ICC but doesn‘t relate this to
the range of observations
• kappa and ICC are identical and most
appropriate

52
Generalizability theory
• Backdrop of classical test theory
– All variance in scores can be divided into true and error
variance (overtly simplistic assumption)
– Don‘t exhaust all possible sources of variance
– Doesn‘t account for interaction between sources of error
variance

• G theory
– Cronbach et al 1972
– Essence is the recognition that in any measurement
situation there are multiple sources of error variance (may
be infinite)
53
Reliability Vs. Validity

60
Validity
• Two steps to determine usefulness of a scale
– Reliability – necessary but not sufficient
– Validity – next step

• Validity – is the test measuring what it is meant to measure?
• Two important issues
– The nature of the what is being measured
• Temperature Vs. quality of life/social support (physical vs. abstract)
– Relation to the purported cause
• Sr. creatinine is a measure of kidney func. because we know it is regulated
by the kidneys
• But whether students who do volunteer work will become better doctors?

• Since our understanding of human behaviour is far from
perfect, such predictions have to validated against actual
performance
61
32 degree
Celsius

Depression
score - 32

62
Types of validity
• Three Cs (conventionally)
– Content
– Criterion
• Concurrent
• Predictive

– Construct
– Others (face validity)

• New types
– Convergent, discriminant, trait etc.,
63
Differing perspectives
• Previously validity was seen as demonstrating the properties of
the scale
• Current thinking - what inferences can be made about the
people that have given rise to the scores on these scales?
– Thus validation is a process of hypothesis testing (someone who scores
on test A, will do worse in test B, and will differ from people who do
better in test C and D)
– Researchers are only limited by their imagination to devise experiments
to test such hypotheses

• All types of validity are addressing the same issue of the
degree of confidence we can place in the inferences we can
draw from the scales
64
• Face validity
– On the face of it the tool appears to be measuring what it is
supposed to measure
– Subjective judgment by one/more experts, rarely by
empirical means

• Content validity
– Measures whether the tool includes all relevant domains or
not
– Closely related to face validity
– aka. ‗validity by assumption‘ because an expert says so

• Certain situations where these may not be desired - ?
65
Content validity
• Example – cardiology exam;
– Assume it contains all aspects of the circulatory
system
(physiology,
anatomy,
pathology,
pharmacology etc., etc.,)
– If a person scores high on this test, we can say ‗infer‘
that he knows much about the subject (i.e., our
inferences about the person will right across various
situations)
– In contrast, if the exam did not contain anything about
circulation, the inferences we make about a high scorer
may be wrong most of the time and vice versa
66
• Generally, a measure that includes a more representative
sample of the target behaviour will have more content validity
and hence lead to more accurate inferences
• Reliability places an upper limit on validity (the maximum
validity is the square root of reliability coeff.) the higher the
reliability the higher the maximum possible validity
– One exception is that between internal consistency and
validity (better to sacrifice IC to content validity)
– The ultimate aim of scale is inferential which depends more
on content validity than internal consistency

67
Criterion validity
• Correlation of a scale to an accepted ‗gold standard‘
• Two types
– Concurrent (both the new scale and standard scale are given at the
same time)
– Predictive – the GS results will be available some time in the future
(eg. Entrance test for college admission to assess if a person will
graduate or not)
• Why develop a new scale when we already have a criterion scale?
– Diagnostic utility/substitutability
– Predictive utility (no decision can be made on the basis of new
scale)
• Criterion contamination
– If the result of the GS is in part determined in some way by the
results of the new test, it may lead to an artificially high correlation
68
Construct validity
• Height, weight – readily observable
• Psychological - anxiety, pain, intelligence are abstract
variables and can‘t be directly observed
• For eg. Anxiety – we say that a person has anxiety if he has
sweaty palms, tachycardia, pacing back and forth, difficulty in
concentrating etc., (i.e., we have a hypothesize that these
symptoms are the result of anxiety)
• Such proposed underlying factors are called hypothetical
constructs/ constructs (eg. Anxiety, illness behaviour)
• Such constructs arise from larger theories/ clinical
observations
• Most psychological instruments tap some aspect of construct
69
Early
morning
stiffness

3 or more
joints
involved
esp., small
joints

Rheumatoid
arthritis

Elevated
ESR, RA
factor

X rays
changes
70
Establishing construct validity
• IBS is a construct rather than a disease – it is a
diagnosis of exclusion
• A large vocabulary, wide knowledge and
problem solving skills – what is the underlying
construct?
• Many clinical syndromes are constructs rather
than actual entities (schizophrenia, SLE)

71
• Initial scales for IBS – ruling out other organic
diseases and some physical signs and symptoms
– These scales were inadequate because they lead to
many missed and wrong diagnoses
– New scales developed incorporating demographical
features and personality features

• Now how to assess the validity of this new scale
– Based on my theory, high scorers on this scale should
have
• Symptoms which will not clear with conventional therapy
• Lower prevalence of organic bowel disease on autopsy

72
Differences form other types
1. Content and criterion can be established in one or two
studies, but there is no single experiment that can prove a
construct
• Construct validation is an ongoing process, learning more
about the construct, making new predictions and then
testing them
• Each supportive study strengthens the construct but one
well designed negative study can question the entire
construct
2. We are assessing the theory as well as the measure at the
same time
73
IBS example
• We had predicted that IBS patients will not respond to
conventional therapy
• Assume that we gave the test to a sample of patients
with GI symptoms and treated them with conventional
therapy
• If high scoring patients responded in the same
proportion as low scorers then there are 3 possibilities
– Our scale is good but theory wrong
– Our theory is good but scale bad
– Both scale and theory are bad

• We can identify the reason only from further studies
74
• If an experimental design is used to test the
construct, then in addition to the above
possibilities our experiment may be flawed
• Ultimately, construct validity doesn‘t differ
conceptually from other types of validity
– All validity is at its base some form of construct
validity… it is the basic meaning of validity –
(Guion)
75
Establishing construct validity
• Extreme groups
• Convergent and discriminant validity
• Multitrait-multimethod matrix

76
Extreme groups
• Two groups – as decided by clinicians
– One IBS and the other some other GI disease
– Equivocal diagnosis eliminated

• Two problems
– That we are able to separate two extreme groups implies
that we already have a tool which meets our needs
(however we can do bootstrapping)
– This is not sufficient, the real use of a scale is making much
finer discriminations. But such studies can be a first step, if
the scale fails this it will be probably useless in practical
situations
77
• Convergent validity - If there are two measures for
the same construct, then they should correlate with
each other but should not correlate too much.
E.g. Index of anxiety and ANS awareness index
• Divergent validity – the measure should not correlate
with a measure of a different construct, eg. Anxiety
index and intelligence index

78
Multitrait-multimethod matrix
• Two unrelated traits/constructs each measured by two different methods
• Eg. Two traits – anxiety, intelligence; two methods – a rater, exam
Anxiety
Rater
Anxiety

Intelligence

–
–
–
–

Exam

Intelligence
Rater

Rater

0.53

Exam

0.42

0.79

Rater

0.18

0.17

0.58

Exam

0.15

0.23

0.49

Exam

0.88

Purple – reliabilities of the four instruments (sh be highest)
Blue – homotrait heteromethod corr. (convergent validity)
Yellow – heterotrait homomethod corr. (divergent validity)
Red – heterotrait heteromethod corr. (sh be lowest)

• Very powerful method but very difficult to get such a combination

79
Biases in validity assessment
• Restriction in range
• May be in new scale (MAO level)
• May be in criterion (depression score)
• A third variable correlated to both (severity)

• Eg. A high correlation was found between
MAO levels and depression score in
community based study, but on replicating the
study in hospital the correlation was low
80
81
82
Measuring change
• Ultimate goal of most treatment studies is to
induce a change in the patient‘s status
• Controversial views against and for scales which
are more sensitive to change in health status
• Goals of measuring change
– To distinguish between those individuals who change
a lot and those who change little
– To identify correlates of change
– To infer treatment effects from group differences
83
• It is easier to demonstrate a consistent change
in all the subjects, rather than different
amounts of change in different subjects
• Why don‘t we measure change directly?
– Ask patients how have they changed since they
were put on the treatment, because people simply
do not remember how they were at the beginning
(validity of such response is debatable)
– Most defensible way to assess change is to
measure it directly at the beginning of the study
and subsequently on one or more occasions
84
Measures of association
• Reliability
• Sensitivity to change

85
Reliability of change score

86
Sensitivity to change from
treatment effects

87
Item response theory
• Limitations of G theory
– Subject/population specific
– Difficult to compare a person‘s score on two or more
different tests (convert to z scores, normality assumption,
not always correct)
– Homoscedasticity assumption that errors are the same at
the ends as in the middle range of scores
– Assumption that all items have equal valences
• Classical test theory – difficult to separate the properties of the test
from the attributes of the people taking it – the tool‘s properties
change as the people tested change, the people‘s properties change
as the test cahnges
91
• IRT – claims to rectify these limitations
– Based on two ‗hard assumptions‘
1. Data are unidimensional (tap only one trait)
2. The probability of answering any item in positive
direction is unrelated to the probability of answering
any other item positively for people with the same
amount of the trait (local independence)

– Two postulates
1. Performance of a subject can be predicted by a set of
factors called ‗traits‘ or abilities, latent traits (theta)
2. The relationship between a person‘s performance on
any item and the underlying trait can be described by
an ‘item characteristic curve’
92
• Some important properties of ICC
– They are ‗ogives‘, usually
– Monotonic; the prob. of answering in a positive
direction consistently increases as the score on the
trait increases
– Differ from each other in three dimensions
• Slope
• Location along the trait
• The flattening out at the bottom

– Can be thought of as ‗imperfect‘ Guttman scales

93
Item characteristic curve

• Q. A is a better discriminator than Q. B
• Q. B is harder than Q. A

94
Different models of ICC
• One parameter model (Rasch model)
– Assumes that all items have equal discriminating ability but
different difficulty

95
• Two parameter model
– Assumes that both discriminating ability and
difficulty differ

96
• Three parameter model
– In addition to the two parameter the lower end of the tail
asymptotes at some probability greater than 0
– Takes care of the fact that when people answer questions by
guessing/ items that are correct by chance

97
Deriving the curves
• Taking a large number of subjects (200 for
one-parameter model, 1000 to estimate the 3
parameter model)
• Random sampling Vs. latent trait model
– In random sampling – it is not necessary to know
much about the items but large pool of items
required
– Latent trait model – fewer items are required but
every item should be known in detail
98
Advantages and disadvantages
• Allows test-free measurement; people can be
compared to one another even if they took different
items
• Eg. Wide Range Achievement Test
• People in different levels can be given different items
and yet be placed on the same scale at the end
(adaptive/tailored testing)
• Not widely used because
– Large sample size needed to estimate the parameters
– Assumptions are difficult to meet
99
Future guidelines for developing health
measurements
1. Articles/manuals should give full description of purpose,
population, intended use
2. Rationale for design of the instrument – conceptual
definition if the object of measurement
3. Describe the ways in which questions were selected
4. Revisions if any should be stated along with reliability and
validity
5. Clear instructions for standard administration and scoring
6. Reliability and validity testing should examine both
internal structure and its relation to alternative
measurements of the concept
7. The tool should be testes by users other than the original
authors
100
Critical appraisal – Rcq - 36
•
•
•
•
•

What is the population in this study?
What is the type of scale?
What is the scaling method used?
Have they missed any method for item generation?
Is Cronbach‘s alfa calculated appropriately and is the
scale reliable?
• Is it appropriate to calculate mean (SD) for each domain?
• Have they established construct validity in this study?
Comment on the MTMM matrix used.
• Can this scale be used to measure treatment effects for
RC?
101
Thank you
―Belief is no substitute for arithmetic‖
— Henry Spencer

102
Scaling Response
Continuous

Direct
estimation
VAS

Discrete

Categorical

Comparative
methods
Adjectival scale

Continuous

Econometric
methods
Specific scaling

Likert

103

Weitere ähnliche Inhalte

Was ist angesagt?

DEVELOPMENT AND EVALUATION OF SCALES/INSTRUMENTS IN PSYCHIATRY
DEVELOPMENT AND EVALUATION OF SCALES/INSTRUMENTS IN PSYCHIATRYDEVELOPMENT AND EVALUATION OF SCALES/INSTRUMENTS IN PSYCHIATRY
DEVELOPMENT AND EVALUATION OF SCALES/INSTRUMENTS IN PSYCHIATRYPawan Sharma
 
Measurement in social science research
Measurement in social science research Measurement in social science research
Measurement in social science research Yagnesh sondarva
 
Mba2216 week 07 08 measurement and data collection forms
Mba2216 week 07 08 measurement and data collection formsMba2216 week 07 08 measurement and data collection forms
Mba2216 week 07 08 measurement and data collection formsStephen Ong
 
Measurement and scaling
Measurement and scalingMeasurement and scaling
Measurement and scalingJithin Thomas
 
Topic 7 measurement in research
Topic 7   measurement in researchTopic 7   measurement in research
Topic 7 measurement in researchDhani Ahmad
 
Concept of Measurements in Business Research
Concept of Measurements in Business ResearchConcept of Measurements in Business Research
Concept of Measurements in Business ResearchCS PRADHAN
 
Research methodology measurement
Research methodology measurement Research methodology measurement
Research methodology measurement 49bhu
 
Measurement in research
Measurement in researchMeasurement in research
Measurement in researchBikram Pradhan
 
Measurement, scaling and sampling
Measurement, scaling and samplingMeasurement, scaling and sampling
Measurement, scaling and samplingRajThakuri
 
Scale construction babita
Scale construction babitaScale construction babita
Scale construction babitaBabita Thapa
 
Guidelines in writing items for noncognitive measures
Guidelines in writing items for noncognitive measuresGuidelines in writing items for noncognitive measures
Guidelines in writing items for noncognitive measuresCarlo Magno
 
Attitude measurement and scaling techniques
Attitude measurement and scaling techniquesAttitude measurement and scaling techniques
Attitude measurement and scaling techniquesCharu Rastogi
 
Attitude scales ppt
Attitude scales pptAttitude scales ppt
Attitude scales pptpranveer123
 

Was ist angesagt? (20)

DEVELOPMENT AND EVALUATION OF SCALES/INSTRUMENTS IN PSYCHIATRY
DEVELOPMENT AND EVALUATION OF SCALES/INSTRUMENTS IN PSYCHIATRYDEVELOPMENT AND EVALUATION OF SCALES/INSTRUMENTS IN PSYCHIATRY
DEVELOPMENT AND EVALUATION OF SCALES/INSTRUMENTS IN PSYCHIATRY
 
Likert scale
Likert scaleLikert scale
Likert scale
 
Attitude Scales
Attitude ScalesAttitude Scales
Attitude Scales
 
Chapter 7
Chapter 7Chapter 7
Chapter 7
 
Measurement in social science research
Measurement in social science research Measurement in social science research
Measurement in social science research
 
Mba2216 week 07 08 measurement and data collection forms
Mba2216 week 07 08 measurement and data collection formsMba2216 week 07 08 measurement and data collection forms
Mba2216 week 07 08 measurement and data collection forms
 
Measurement and scaling
Measurement and scalingMeasurement and scaling
Measurement and scaling
 
Topic 7 measurement in research
Topic 7   measurement in researchTopic 7   measurement in research
Topic 7 measurement in research
 
Monika seminar
Monika seminarMonika seminar
Monika seminar
 
Measurement
MeasurementMeasurement
Measurement
 
Concept of Measurements in Business Research
Concept of Measurements in Business ResearchConcept of Measurements in Business Research
Concept of Measurements in Business Research
 
Research methodology measurement
Research methodology measurement Research methodology measurement
Research methodology measurement
 
Reliability Analysis
Reliability AnalysisReliability Analysis
Reliability Analysis
 
Measurement in research
Measurement in researchMeasurement in research
Measurement in research
 
Measurement, scaling and sampling
Measurement, scaling and samplingMeasurement, scaling and sampling
Measurement, scaling and sampling
 
Scale construction babita
Scale construction babitaScale construction babita
Scale construction babita
 
Attitude scales
Attitude scalesAttitude scales
Attitude scales
 
Guidelines in writing items for noncognitive measures
Guidelines in writing items for noncognitive measuresGuidelines in writing items for noncognitive measures
Guidelines in writing items for noncognitive measures
 
Attitude measurement and scaling techniques
Attitude measurement and scaling techniquesAttitude measurement and scaling techniques
Attitude measurement and scaling techniques
 
Attitude scales ppt
Attitude scales pptAttitude scales ppt
Attitude scales ppt
 

Andere mochten auch

Test production process - Approaches to language testing - Techniques of lang...
Test production process - Approaches to language testing - Techniques of lang...Test production process - Approaches to language testing - Techniques of lang...
Test production process - Approaches to language testing - Techniques of lang...Phạm Phúc Khánh Minh
 
Reliability, validity, generalizability and the use of multi-item scales
Reliability, validity, generalizability and the use of multi-item scalesReliability, validity, generalizability and the use of multi-item scales
Reliability, validity, generalizability and the use of multi-item scalesdakter Cmc
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresCarlo Magno
 
IRT - Item response Theory
IRT - Item response TheoryIRT - Item response Theory
IRT - Item response TheoryAjay Dhamija
 
BCIL (Department of Science & Technology) Entrepreneurship & Startup Mentorin...
BCIL (Department of Science & Technology) Entrepreneurship & Startup Mentorin...BCIL (Department of Science & Technology) Entrepreneurship & Startup Mentorin...
BCIL (Department of Science & Technology) Entrepreneurship & Startup Mentorin...Dr Ritesh Malik
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing Seray Tanyer
 
Health policy and planning
Health policy and planning Health policy and planning
Health policy and planning Rizwan S A
 
District and PHC level health planning
District and PHC level health planningDistrict and PHC level health planning
District and PHC level health planningRizwan S A
 
Health care delivery in India
Health care delivery in IndiaHealth care delivery in India
Health care delivery in IndiaRizwan S A
 
Water Purification - Part 1
Water Purification - Part 1Water Purification - Part 1
Water Purification - Part 1Rizwan S A
 
Vitamins and minerals
Vitamins and mineralsVitamins and minerals
Vitamins and mineralsRizwan S A
 
Community Nutrition Programmes in India
Community Nutrition Programmes in IndiaCommunity Nutrition Programmes in India
Community Nutrition Programmes in IndiaRizwan S A
 
Achievement test
Achievement testAchievement test
Achievement testFatin Idris
 
Chapter 3 Notes-Environmental Health
Chapter 3 Notes-Environmental HealthChapter 3 Notes-Environmental Health
Chapter 3 Notes-Environmental Healthduncanpatti
 
Planning an achievement test and assessment
Planning an achievement test and assessmentPlanning an achievement test and assessment
Planning an achievement test and assessmentUmair Ashraf
 
Some concepts in health
Some concepts in healthSome concepts in health
Some concepts in healthRizwan S A
 

Andere mochten auch (20)

Test production process - Approaches to language testing - Techniques of lang...
Test production process - Approaches to language testing - Techniques of lang...Test production process - Approaches to language testing - Techniques of lang...
Test production process - Approaches to language testing - Techniques of lang...
 
Week 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and ReliabilityWeek 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and Reliability
 
Test planning
Test planningTest planning
Test planning
 
Reliability, validity, generalizability and the use of multi-item scales
Reliability, validity, generalizability and the use of multi-item scalesReliability, validity, generalizability and the use of multi-item scales
Reliability, validity, generalizability and the use of multi-item scales
 
MCQ Workshop - Dr Jane Holland
MCQ Workshop - Dr Jane HollandMCQ Workshop - Dr Jane Holland
MCQ Workshop - Dr Jane Holland
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing Measures
 
IRT - Item response Theory
IRT - Item response TheoryIRT - Item response Theory
IRT - Item response Theory
 
BCIL (Department of Science & Technology) Entrepreneurship & Startup Mentorin...
BCIL (Department of Science & Technology) Entrepreneurship & Startup Mentorin...BCIL (Department of Science & Technology) Entrepreneurship & Startup Mentorin...
BCIL (Department of Science & Technology) Entrepreneurship & Startup Mentorin...
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing
 
Health policy and planning
Health policy and planning Health policy and planning
Health policy and planning
 
Test Planning
Test PlanningTest Planning
Test Planning
 
District and PHC level health planning
District and PHC level health planningDistrict and PHC level health planning
District and PHC level health planning
 
Health care delivery in India
Health care delivery in IndiaHealth care delivery in India
Health care delivery in India
 
Water Purification - Part 1
Water Purification - Part 1Water Purification - Part 1
Water Purification - Part 1
 
Vitamins and minerals
Vitamins and mineralsVitamins and minerals
Vitamins and minerals
 
Community Nutrition Programmes in India
Community Nutrition Programmes in IndiaCommunity Nutrition Programmes in India
Community Nutrition Programmes in India
 
Achievement test
Achievement testAchievement test
Achievement test
 
Chapter 3 Notes-Environmental Health
Chapter 3 Notes-Environmental HealthChapter 3 Notes-Environmental Health
Chapter 3 Notes-Environmental Health
 
Planning an achievement test and assessment
Planning an achievement test and assessmentPlanning an achievement test and assessment
Planning an achievement test and assessment
 
Some concepts in health
Some concepts in healthSome concepts in health
Some concepts in health
 

Ähnlich wie Development of health measurement scales – part 2

Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptxChapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptxHazelLansula1
 
unit 9 measurements presentation- short.ppt
unit 9 measurements presentation- short.pptunit 9 measurements presentation- short.ppt
unit 9 measurements presentation- short.pptMitikuTeka1
 
7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdfezaldeen2013
 
Scaling and measurement technique
Scaling and measurement techniqueScaling and measurement technique
Scaling and measurement techniqueSiddharth Gupta
 
Questionnaire and Instrument validity
Questionnaire and Instrument validityQuestionnaire and Instrument validity
Questionnaire and Instrument validitymdanaee
 
1625941932480.pptx
1625941932480.pptx1625941932480.pptx
1625941932480.pptxMathiQueeny
 
Basic Statistical Concepts.pdf
Basic Statistical Concepts.pdfBasic Statistical Concepts.pdf
Basic Statistical Concepts.pdfKwangheeJung
 
Reliability of test
Reliability of testReliability of test
Reliability of testSarat Rout
 
Non parametric study; Statistical approach for med student
Non parametric study; Statistical approach for med student Non parametric study; Statistical approach for med student
Non parametric study; Statistical approach for med student Dr. Rupendra Bharti
 
Chapter 11 quantitative data
Chapter 11 quantitative dataChapter 11 quantitative data
Chapter 11 quantitative datau59
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptvigia41
 

Ähnlich wie Development of health measurement scales – part 2 (20)

Agreement analysis
Agreement analysisAgreement analysis
Agreement analysis
 
Validity andreliability
Validity andreliabilityValidity andreliability
Validity andreliability
 
Data analysis
Data analysisData analysis
Data analysis
 
UNIT 5.pptx
UNIT 5.pptxUNIT 5.pptx
UNIT 5.pptx
 
MSA (GR&R)
MSA (GR&R)MSA (GR&R)
MSA (GR&R)
 
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptxChapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx
 
PA_EPGDM_2_2023.pptx
PA_EPGDM_2_2023.pptxPA_EPGDM_2_2023.pptx
PA_EPGDM_2_2023.pptx
 
unit 9 measurements presentation- short.ppt
unit 9 measurements presentation- short.pptunit 9 measurements presentation- short.ppt
unit 9 measurements presentation- short.ppt
 
7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf
 
Chi squared test
Chi squared testChi squared test
Chi squared test
 
Scaling and measurement technique
Scaling and measurement techniqueScaling and measurement technique
Scaling and measurement technique
 
Questionnaire and Instrument validity
Questionnaire and Instrument validityQuestionnaire and Instrument validity
Questionnaire and Instrument validity
 
1625941932480.pptx
1625941932480.pptx1625941932480.pptx
1625941932480.pptx
 
Basic Statistical Concepts.pdf
Basic Statistical Concepts.pdfBasic Statistical Concepts.pdf
Basic Statistical Concepts.pdf
 
Reliability of test
Reliability of testReliability of test
Reliability of test
 
Non parametric study; Statistical approach for med student
Non parametric study; Statistical approach for med student Non parametric study; Statistical approach for med student
Non parametric study; Statistical approach for med student
 
Chisquare Test
Chisquare Test Chisquare Test
Chisquare Test
 
Hypothsis testing
Hypothsis testingHypothsis testing
Hypothsis testing
 
Chapter 11 quantitative data
Chapter 11 quantitative dataChapter 11 quantitative data
Chapter 11 quantitative data
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
 

Mehr von Rizwan S A

Introduction to scoping reviews
Introduction to scoping reviewsIntroduction to scoping reviews
Introduction to scoping reviewsRizwan S A
 
Sources of demographic data 2019
Sources of demographic data 2019Sources of demographic data 2019
Sources of demographic data 2019Rizwan S A
 
Effect sizes in meta-analysis
Effect sizes in meta-analysisEffect sizes in meta-analysis
Effect sizes in meta-analysisRizwan S A
 
Presenting the results of meta-analysis
Presenting the results of meta-analysisPresenting the results of meta-analysis
Presenting the results of meta-analysisRizwan S A
 
Heterogeneity in meta-analysis
Heterogeneity in meta-analysisHeterogeneity in meta-analysis
Heterogeneity in meta-analysisRizwan S A
 
Overview of the systematic review process
Overview of the systematic review processOverview of the systematic review process
Overview of the systematic review processRizwan S A
 
Biases in meta-analysis
Biases in meta-analysisBiases in meta-analysis
Biases in meta-analysisRizwan S A
 
Moderator analysis in meta-analysis
Moderator analysis in meta-analysisModerator analysis in meta-analysis
Moderator analysis in meta-analysisRizwan S A
 
Fixed-effect and random-effects models in meta-analysis
Fixed-effect and random-effects models in meta-analysisFixed-effect and random-effects models in meta-analysis
Fixed-effect and random-effects models in meta-analysisRizwan S A
 
Inverse variance method of meta-analysis and Cochran's Q
Inverse variance method of meta-analysis and Cochran's QInverse variance method of meta-analysis and Cochran's Q
Inverse variance method of meta-analysis and Cochran's QRizwan S A
 
Data extraction/coding and database structure in meta-analysis
Data extraction/coding and database structure in meta-analysisData extraction/coding and database structure in meta-analysis
Data extraction/coding and database structure in meta-analysisRizwan S A
 
Introduction & rationale for meta-analysis
Introduction & rationale for meta-analysisIntroduction & rationale for meta-analysis
Introduction & rationale for meta-analysisRizwan S A
 
Types of correlation coefficients
Types of correlation coefficientsTypes of correlation coefficients
Types of correlation coefficientsRizwan S A
 
Checking for normality (Normal distribution)
Checking for normality (Normal distribution)Checking for normality (Normal distribution)
Checking for normality (Normal distribution)Rizwan S A
 
Analysis of small datasets
Analysis of small datasetsAnalysis of small datasets
Analysis of small datasetsRizwan S A
 
A introduction to non-parametric tests
A introduction to non-parametric testsA introduction to non-parametric tests
A introduction to non-parametric testsRizwan S A
 
Kruskal Wallis test, Friedman test, Spearman Correlation
Kruskal Wallis test, Friedman test, Spearman CorrelationKruskal Wallis test, Friedman test, Spearman Correlation
Kruskal Wallis test, Friedman test, Spearman CorrelationRizwan S A
 
Kolmogorov Smirnov good-of-fit test
Kolmogorov Smirnov good-of-fit testKolmogorov Smirnov good-of-fit test
Kolmogorov Smirnov good-of-fit testRizwan S A
 
Mantel Haenszel methods in epidemiology (Stratification)
Mantel Haenszel methods in epidemiology (Stratification) Mantel Haenszel methods in epidemiology (Stratification)
Mantel Haenszel methods in epidemiology (Stratification) Rizwan S A
 
Use of checklists in critical appraisal of health literature
Use of checklists in critical appraisal of health literatureUse of checklists in critical appraisal of health literature
Use of checklists in critical appraisal of health literatureRizwan S A
 

Mehr von Rizwan S A (20)

Introduction to scoping reviews
Introduction to scoping reviewsIntroduction to scoping reviews
Introduction to scoping reviews
 
Sources of demographic data 2019
Sources of demographic data 2019Sources of demographic data 2019
Sources of demographic data 2019
 
Effect sizes in meta-analysis
Effect sizes in meta-analysisEffect sizes in meta-analysis
Effect sizes in meta-analysis
 
Presenting the results of meta-analysis
Presenting the results of meta-analysisPresenting the results of meta-analysis
Presenting the results of meta-analysis
 
Heterogeneity in meta-analysis
Heterogeneity in meta-analysisHeterogeneity in meta-analysis
Heterogeneity in meta-analysis
 
Overview of the systematic review process
Overview of the systematic review processOverview of the systematic review process
Overview of the systematic review process
 
Biases in meta-analysis
Biases in meta-analysisBiases in meta-analysis
Biases in meta-analysis
 
Moderator analysis in meta-analysis
Moderator analysis in meta-analysisModerator analysis in meta-analysis
Moderator analysis in meta-analysis
 
Fixed-effect and random-effects models in meta-analysis
Fixed-effect and random-effects models in meta-analysisFixed-effect and random-effects models in meta-analysis
Fixed-effect and random-effects models in meta-analysis
 
Inverse variance method of meta-analysis and Cochran's Q
Inverse variance method of meta-analysis and Cochran's QInverse variance method of meta-analysis and Cochran's Q
Inverse variance method of meta-analysis and Cochran's Q
 
Data extraction/coding and database structure in meta-analysis
Data extraction/coding and database structure in meta-analysisData extraction/coding and database structure in meta-analysis
Data extraction/coding and database structure in meta-analysis
 
Introduction & rationale for meta-analysis
Introduction & rationale for meta-analysisIntroduction & rationale for meta-analysis
Introduction & rationale for meta-analysis
 
Types of correlation coefficients
Types of correlation coefficientsTypes of correlation coefficients
Types of correlation coefficients
 
Checking for normality (Normal distribution)
Checking for normality (Normal distribution)Checking for normality (Normal distribution)
Checking for normality (Normal distribution)
 
Analysis of small datasets
Analysis of small datasetsAnalysis of small datasets
Analysis of small datasets
 
A introduction to non-parametric tests
A introduction to non-parametric testsA introduction to non-parametric tests
A introduction to non-parametric tests
 
Kruskal Wallis test, Friedman test, Spearman Correlation
Kruskal Wallis test, Friedman test, Spearman CorrelationKruskal Wallis test, Friedman test, Spearman Correlation
Kruskal Wallis test, Friedman test, Spearman Correlation
 
Kolmogorov Smirnov good-of-fit test
Kolmogorov Smirnov good-of-fit testKolmogorov Smirnov good-of-fit test
Kolmogorov Smirnov good-of-fit test
 
Mantel Haenszel methods in epidemiology (Stratification)
Mantel Haenszel methods in epidemiology (Stratification) Mantel Haenszel methods in epidemiology (Stratification)
Mantel Haenszel methods in epidemiology (Stratification)
 
Use of checklists in critical appraisal of health literature
Use of checklists in critical appraisal of health literatureUse of checklists in critical appraisal of health literature
Use of checklists in critical appraisal of health literature
 

Kürzlich hochgeladen

MedMatch: Your Health, Our Mission. Pitch deck.
MedMatch: Your Health, Our Mission. Pitch deck.MedMatch: Your Health, Our Mission. Pitch deck.
MedMatch: Your Health, Our Mission. Pitch deck.whalesdesign
 
ORAL HYPOGLYCAEMIC AGENTS - PART 2.pptx
ORAL HYPOGLYCAEMIC AGENTS  - PART 2.pptxORAL HYPOGLYCAEMIC AGENTS  - PART 2.pptx
ORAL HYPOGLYCAEMIC AGENTS - PART 2.pptxNIKITA BHUTE
 
Breast cancer -ONCO IN MEDICAL AND SURGICAL NURSING.pptx
Breast cancer -ONCO IN MEDICAL AND SURGICAL NURSING.pptxBreast cancer -ONCO IN MEDICAL AND SURGICAL NURSING.pptx
Breast cancer -ONCO IN MEDICAL AND SURGICAL NURSING.pptxNaveenkumar267201
 
Different drug regularity bodies in different countries.
Different drug regularity bodies in different countries.Different drug regularity bodies in different countries.
Different drug regularity bodies in different countries.kishan singh tomar
 
Female Reproductive Physiology Before Pregnancy
Female Reproductive Physiology Before PregnancyFemale Reproductive Physiology Before Pregnancy
Female Reproductive Physiology Before PregnancyMedicoseAcademics
 
Trustworthiness of AI based predictions Aachen 2024
Trustworthiness of AI based predictions Aachen 2024Trustworthiness of AI based predictions Aachen 2024
Trustworthiness of AI based predictions Aachen 2024EwoutSteyerberg1
 
Male Infertility Panel Discussion by Dr Sujoy Dasgupta
Male Infertility Panel Discussion by Dr Sujoy DasguptaMale Infertility Panel Discussion by Dr Sujoy Dasgupta
Male Infertility Panel Discussion by Dr Sujoy DasguptaSujoy Dasgupta
 
Role of Soap based and synthetic or syndets bar
Role of  Soap based and synthetic or syndets barRole of  Soap based and synthetic or syndets bar
Role of Soap based and synthetic or syndets barmohitRahangdale
 
SGK LEUKEMIA KINH DÒNG BẠCH CÂU HẠT HAY.pdf
SGK LEUKEMIA KINH DÒNG BẠCH CÂU HẠT HAY.pdfSGK LEUKEMIA KINH DÒNG BẠCH CÂU HẠT HAY.pdf
SGK LEUKEMIA KINH DÒNG BẠCH CÂU HẠT HAY.pdfHongBiThi1
 
Neurological history taking (2024) .
Neurological  history  taking  (2024)  .Neurological  history  taking  (2024)  .
Neurological history taking (2024) .Mohamed Rizk Khodair
 
DNA nucleotides Blast in NCBI and Phylogeny using MEGA Xi.pptx
DNA nucleotides Blast in NCBI and Phylogeny using MEGA Xi.pptxDNA nucleotides Blast in NCBI and Phylogeny using MEGA Xi.pptx
DNA nucleotides Blast in NCBI and Phylogeny using MEGA Xi.pptxMAsifAhmad
 
Red Blood Cells_anemia & polycythemia.pdf
Red Blood Cells_anemia & polycythemia.pdfRed Blood Cells_anemia & polycythemia.pdf
Red Blood Cells_anemia & polycythemia.pdfMedicoseAcademics
 
Bulimia nervosa ( Eating Disorders) Mental Health Nursing.
Bulimia nervosa ( Eating Disorders) Mental Health Nursing.Bulimia nervosa ( Eating Disorders) Mental Health Nursing.
Bulimia nervosa ( Eating Disorders) Mental Health Nursing.aarjukhadka22
 
Generative AI in Health Care a scoping review and a persoanl experience.
Generative AI in Health Care a scoping review and a persoanl experience.Generative AI in Health Care a scoping review and a persoanl experience.
Generative AI in Health Care a scoping review and a persoanl experience.Vaikunthan Rajaratnam
 
Unit I herbs as raw materials, biodynamic agriculture.ppt
Unit I herbs as raw materials, biodynamic agriculture.pptUnit I herbs as raw materials, biodynamic agriculture.ppt
Unit I herbs as raw materials, biodynamic agriculture.pptPradnya Wadekar
 
SGK RỐI LOẠN TOAN KIỀM ĐHYHN RẤT HAY VÀ ĐẶC SẮC.pdf
SGK RỐI LOẠN TOAN KIỀM ĐHYHN RẤT HAY VÀ ĐẶC SẮC.pdfSGK RỐI LOẠN TOAN KIỀM ĐHYHN RẤT HAY VÀ ĐẶC SẮC.pdf
SGK RỐI LOẠN TOAN KIỀM ĐHYHN RẤT HAY VÀ ĐẶC SẮC.pdfHongBiThi1
 
blood bank management system project report
blood bank management system project reportblood bank management system project report
blood bank management system project reportNARMADAPETROLEUMGAS
 
Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Peter Embi
 

Kürzlich hochgeladen (20)

MedMatch: Your Health, Our Mission. Pitch deck.
MedMatch: Your Health, Our Mission. Pitch deck.MedMatch: Your Health, Our Mission. Pitch deck.
MedMatch: Your Health, Our Mission. Pitch deck.
 
ORAL HYPOGLYCAEMIC AGENTS - PART 2.pptx
ORAL HYPOGLYCAEMIC AGENTS  - PART 2.pptxORAL HYPOGLYCAEMIC AGENTS  - PART 2.pptx
ORAL HYPOGLYCAEMIC AGENTS - PART 2.pptx
 
Breast cancer -ONCO IN MEDICAL AND SURGICAL NURSING.pptx
Breast cancer -ONCO IN MEDICAL AND SURGICAL NURSING.pptxBreast cancer -ONCO IN MEDICAL AND SURGICAL NURSING.pptx
Breast cancer -ONCO IN MEDICAL AND SURGICAL NURSING.pptx
 
Different drug regularity bodies in different countries.
Different drug regularity bodies in different countries.Different drug regularity bodies in different countries.
Different drug regularity bodies in different countries.
 
Female Reproductive Physiology Before Pregnancy
Female Reproductive Physiology Before PregnancyFemale Reproductive Physiology Before Pregnancy
Female Reproductive Physiology Before Pregnancy
 
How to master Steroid (glucocorticoids) prescription, different scenarios, ca...
How to master Steroid (glucocorticoids) prescription, different scenarios, ca...How to master Steroid (glucocorticoids) prescription, different scenarios, ca...
How to master Steroid (glucocorticoids) prescription, different scenarios, ca...
 
Trustworthiness of AI based predictions Aachen 2024
Trustworthiness of AI based predictions Aachen 2024Trustworthiness of AI based predictions Aachen 2024
Trustworthiness of AI based predictions Aachen 2024
 
Male Infertility Panel Discussion by Dr Sujoy Dasgupta
Male Infertility Panel Discussion by Dr Sujoy DasguptaMale Infertility Panel Discussion by Dr Sujoy Dasgupta
Male Infertility Panel Discussion by Dr Sujoy Dasgupta
 
Role of Soap based and synthetic or syndets bar
Role of  Soap based and synthetic or syndets barRole of  Soap based and synthetic or syndets bar
Role of Soap based and synthetic or syndets bar
 
SGK LEUKEMIA KINH DÒNG BẠCH CÂU HẠT HAY.pdf
SGK LEUKEMIA KINH DÒNG BẠCH CÂU HẠT HAY.pdfSGK LEUKEMIA KINH DÒNG BẠCH CÂU HẠT HAY.pdf
SGK LEUKEMIA KINH DÒNG BẠCH CÂU HẠT HAY.pdf
 
Neurological history taking (2024) .
Neurological  history  taking  (2024)  .Neurological  history  taking  (2024)  .
Neurological history taking (2024) .
 
DNA nucleotides Blast in NCBI and Phylogeny using MEGA Xi.pptx
DNA nucleotides Blast in NCBI and Phylogeny using MEGA Xi.pptxDNA nucleotides Blast in NCBI and Phylogeny using MEGA Xi.pptx
DNA nucleotides Blast in NCBI and Phylogeny using MEGA Xi.pptx
 
Red Blood Cells_anemia & polycythemia.pdf
Red Blood Cells_anemia & polycythemia.pdfRed Blood Cells_anemia & polycythemia.pdf
Red Blood Cells_anemia & polycythemia.pdf
 
Bulimia nervosa ( Eating Disorders) Mental Health Nursing.
Bulimia nervosa ( Eating Disorders) Mental Health Nursing.Bulimia nervosa ( Eating Disorders) Mental Health Nursing.
Bulimia nervosa ( Eating Disorders) Mental Health Nursing.
 
Generative AI in Health Care a scoping review and a persoanl experience.
Generative AI in Health Care a scoping review and a persoanl experience.Generative AI in Health Care a scoping review and a persoanl experience.
Generative AI in Health Care a scoping review and a persoanl experience.
 
Unit I herbs as raw materials, biodynamic agriculture.ppt
Unit I herbs as raw materials, biodynamic agriculture.pptUnit I herbs as raw materials, biodynamic agriculture.ppt
Unit I herbs as raw materials, biodynamic agriculture.ppt
 
SGK RỐI LOẠN TOAN KIỀM ĐHYHN RẤT HAY VÀ ĐẶC SẮC.pdf
SGK RỐI LOẠN TOAN KIỀM ĐHYHN RẤT HAY VÀ ĐẶC SẮC.pdfSGK RỐI LOẠN TOAN KIỀM ĐHYHN RẤT HAY VÀ ĐẶC SẮC.pdf
SGK RỐI LOẠN TOAN KIỀM ĐHYHN RẤT HAY VÀ ĐẶC SẮC.pdf
 
blood bank management system project report
blood bank management system project reportblood bank management system project report
blood bank management system project report
 
Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024
 
GOUT UPDATE AHMED YEHIA 2024, case based approach with application of the lat...
GOUT UPDATE AHMED YEHIA 2024, case based approach with application of the lat...GOUT UPDATE AHMED YEHIA 2024, case based approach with application of the lat...
GOUT UPDATE AHMED YEHIA 2024, case based approach with application of the lat...
 

Development of health measurement scales – part 2

  • 1. Development of health measurement scales – part II Dr. Rizwan S A, M.D. If you cannot express in numbers something that you are describing, you probably have little knowledge about it. 1
  • 2. Scaling responses • Categorical • Continuous – Direct Estimation Method • Visual Analogue Scale • Adjectival Scale – Discrete – Continuous • Specific Scaling – Likert Scale – Semantic Scale – Comparative Method • Thurstone’s Method • Paired Comparision Method • Guttmann Method – Econometric Method 2
  • 4. Variance = sum of (individual value – mean value)2 ---------------------------------------------------------------------------------- no. of values 4
  • 5. Reliability • Whether our tool is measuring the attribute in a reproducible fashion or not • A way to show the amount of error (random and systematic) in any measurement • Sources of error – observers, instruments, instability of the attribute • Day to day encounters – Weighing machine, watch, thermometer 5
  • 6. Assessing Reliability • Internal Consistency – The average correlation among all the items in the tool • Item-total correlation • Split half reliability • Kuder-Richardson 20 & Cronbach‘s alpha • Multifactor inventories • Stability – Reproducibility of a measure on different occasions • Inter-Observer reliability • Test-Retest reliability (Intra-Observer reliability) 6
  • 7. Internal consistency • All items in a scale tap different aspects of the same attribute and not different traits • Items should be moderately corr. with each other and each item with the total • Two schools of thought – If the aim is to describe a trait/behaviour/disorder – If the aim is to discriminate people with the trait from those without • The trend is towards scales that are more internally consistent • IC doesn‘t apply to multidimensional scales 7
  • 8. Item-total correlation • Oldest, still used • Correlation of each item with the total score w/o that item • For k number of items, we have to calculate k number of correlations, labourious • Item should be discarded if r < 0.20 • Best is Pearson‘s R, in case of dichotomous items point-biserial correlation 8
  • 9. Split half reliability • Divide the items into two halves and calculate corr. between them • Underestimates the true reliability because we are reducing the length of scale to half (r is directly related to the no. of items) – Corrected by Spearman-Brown formula • Should not be used in – Highly timed achievement tests – Chained items 10
  • 10. KR 20/Cronbach‘s alfa • KR-20 for dichotomous responses • Cronbach‘s alfa for more than two responses • They give the average of all possible split half reliabilities of a scale • If removing an item increases the coeff. it should be discarded • Problems – Depends on the no. of items – A scale with two different sub-scales will prob. yield high alfa – Very high alfa denotes redundancy (asking the same question in slightly different ways) – Thus alfa should be more than 0.70 but not more than 0.90 11
  • 11. • Cronbach‘s basic equation for alpha n n 1 1 Vi Vtest – n = number of questions – Vi = variance of scores on each question – Vtest = total variance of overall scores on the entire test 13
  • 12. Calculation of Cronbach‘s coefficient alpha Example: Assessment of emotional health During the past month: Have you been a very nervous person? Yes No 1 0 Have you felt downhearted and blue? 1 0 Have you felt so down in the dumps that nothing could cheer you up? 1 0 14
  • 13. Results Patient Item 1 Item 2 Item 3 Summed scale score 1 0 1 1 2 2 1 1 1 3 3 0 0 0 0 4 1 1 1 3 5 1 1 0 2 Percentage positive mean score 3/5=.6 4/5=.8 3/5=.6 = 2 15
  • 14. Calculations Mean score = 2 Sample variance = CC alpha 1 (2 2) 2 (3 2) 2 (0 (5 1 (. 8 )(. 2 ) 1 .5 (. 6 )(. 4 ) (3 2) 2 (2 2) 2 1 .5 k k 3 2 1) (% pos ) i (% neg ) i Var (. 6 )(. 4 ) 2) 1 0 . 86 2 Conclude that this scale has good reliability 16
  • 15. Multifactor inventories • More sophisticated techniques • Item-total procedure – each item should correlate with the total of its scale and the total of all the scales • Factor analysis – Determining the underlying factors – For eg., if there are five tests • Vocabulary, fluency, phonetics, reasoning and arithmetic • We can theorize that the first three would be correlated under a factor called ‗verbal factor‘ and the last two under ‗logic factor‘ 20
  • 16. Stability/ Measuring error • A weighing machine shows weight in the range of say 40-80 kg and thus an error of 1kg is meaningful • A ratio will be more useful, measurement error / total variability between subjects But in reality we calculate the ratio variability between subjects / total variability (Total variability includes subjects and measurement error) • So that a ratio of – 1 indicates no measurement error/perfect reliability – 0 indicates otherwise 21
  • 17. • Reliability = subj. variability / (subj. variability + measurement error) • Statistically ‗variance‘ is the measure of variability so, • Reliability = SD2 of subjects / (SD2 of subjects + SD2 of error) • Thus reliability is the proportion of the total variance that is due to the ‗true‘ differences between the subjects • Reliability has meaning only when applied to specific populations 22
  • 18. 1 • Measurement error/ total variability between subjects 2 • Variability between subjects/ total variability 3 • Subj. variability/ (subj. variability + measurement error) 4 • SD2 of subjects/ (SD2 of subjects + SD2 of error) 23
  • 19. Calculation of reliability • The statistical technique used is ANOVA and since we have repeated measurements in reliability, the method is – repeated measures ANOVA 24
  • 20. 25
  • 22. 27
  • 23. • Classical definition of reliability • Interpretation is that 88% of the variance is due to the true variance among patients (aka Intraclass Correlation Coefficient, ICC) 28
  • 24. Fixed/random factor • What happened to the variance due to observers? • Are these the same observers going to be used or they are a random sample? • Other situations where observations may be treated as fixed is subjects answering ‗same items on a scale‘ 29
  • 25. Other types of reliability • We have only examined the effect of different observers on the same behaviour • But there can be error due to ‗day to day‘ differences, if we measure the same behaviour a week or two apart we can calculate ‗intra-observer reliability coefficient‘ • If there are no observers (self-rated tests) we can still calculate ‗test-retest reliability‘ 31
  • 26. • Usually high inter-observer is sufficient, but if it is low then we may have to calculate intra-observer reliability to determine the source of unreliability • Mostly measures of internal consistency are reported as ‗reliability‘, because there are easily computed in a single sitting – Hence caution is required as they may not measure variability due to day to day differences 32
  • 27. Diff. forms of reliability coefficient • So far we have seen forms of ICC • Others – Pearson product-moment correlation – Cohen‘s kappa – Bland – altman analysis 33
  • 28. Pearson‘s correlation • Based on regression – the extent to which the relation between two variables can be described by straight line 34
  • 29. Limitations of Pearson‘s R • A perfect fit of 1.0 may be obtained even if the intercept is non-zero and the slope is not equal to one unlike with ICC • So, Pearson‘s R will be higher than truth, but in practice it is usually equal to ICC as the predominant source of error is random variation • If there are multiple observations then multiple pairwise Rs are required, unlike the single ICC • For eg. with 10 observers there will be 45 Pearson‘s Rs whereas only one ICC 35
  • 32. 38
  • 33. • Used when responses are dichotomous/categorical • When the frequency of positive results is very low or high, kappa will be very high • Weighted kappa focuses on disagreement, cells are weighted according to the distance from the diagonal of agreement • Weighting can be arbitrary or using quadratic weights (based on square of the amount of discrepancy) • Quadratic scheme of weighted kappa is equivalent to ICC • Also, the unweighted kappa is equal to ICC based on ANOVA 39
  • 34. Bland and Altman method • A plot of difference between two observations against the mean of the two observations 40
  • 35. 41
  • 36. • The mean diff. is related to observer variance in ICC, and the SD of differences if related to the error variance in ICC • Limits of agreement are calculated as mean difference the error variance) 2 SE (= to • Agreement is expressed as the ‘limits of agreement’. The presentation of the 95% limits of agreement is for visual judgement of how well two methods of measurement agree. The smaller the range between these two limits the better the agreement is. • The question of how small is small depends on the clinical context: would a difference between measurement methods as extreme as that described by the 95% limits of agreement meaningfully affect the interpretation of the results • Limitation - the onus is placed on the reader to juxtapose the calculated error against some implicit notion of true variability 42
  • 37. Issues in Interpretation SE of measurement and reliability • R is a dimension-less ratio of variances and so it is difficult to interpret R in terms of an individual score • SEM = σ sqrt(1-R) • If we knew the true score of someone, we can estimate the limits within which 68% or 95% of the times the observed value would lie • Eg. A scale with SD 10 and R 0.8. If the true score was 15, we can say 68% of the time his observed value will fall between 10.5 to 19.5 43
  • 38. Standards for magnitude of reliability coeff. • How much reliability is good? Kelly (0.94) Stewart (0.85) • A test for individual judgment should be higher than that for research in groups • For Research purposes – – Mean score and the sample size will reduce the error – Conclusions are usually made after a series of studies – Acceptable reliability is dependent on the sample size in research 45
  • 39. Reliability and probability of misclassification • Depends on the property of the instrument and the decision of cut point • Relation between reliability and likelihood of misclassification – Eg. A sample of 100, one person ranked 25th and another 50th – If the R is 0, 50% chance that the two will reverse order on retesting – If R is 0.5, 37% chance, with R=0.8, 2.2% chance • Hence R of 0.75 is minimum requirement for a useful instrument 46
  • 40. Improving reliability • Increase the subject variance relative to the error variance (by legitimate means and otherwise) • Reducing error variance – Observer/rater training – Removing consistently extreme observers – Designing better scales • Increasing true variance – In case of ‗floor‘ or ‗ceiling‘ effect, introduce items that will bring the performance to the middle of the scale (thus increasing true variance) • Eg. Fair-good-very good-excellent (instead of bad-good) 47
  • 41. • Ways that are not legitimate – Test the scale in a heterogeneous population (normal and bedridden arthritics) – A scale developed in homogeneous population will have a larger reliability when used in a heterogeneous population • correct for attenuation 48
  • 42. • Simplest way to increase R is to increase the no. of items • True variance increases as the square of items whereas error variance increases only as the no. of items • If the length of the test is triples – Then Rspearman brown = 3R/ 1 + 2R 49
  • 43. • In reality the equation overestimates the new reliability • We can also use this equation to determine the length of a test for achieving a pre-decided reliability • To improve test-retest reliability – shorten the interval between the tests • An ideal approach is the examine all the sources of variation and try to reduce the larger ones (generalizability theory) 50
  • 44. Sample size for reliability studies 51
  • 45. Summary for Reliability • Pearson R is theoretically incorrect but in practice fairly close • Bland and Altman method is analogous to error variance of ICC but doesn‘t relate this to the range of observations • kappa and ICC are identical and most appropriate 52
  • 46. Generalizability theory • Backdrop of classical test theory – All variance in scores can be divided into true and error variance (overtly simplistic assumption) – Don‘t exhaust all possible sources of variance – Doesn‘t account for interaction between sources of error variance • G theory – Cronbach et al 1972 – Essence is the recognition that in any measurement situation there are multiple sources of error variance (may be infinite) 53
  • 48. Validity • Two steps to determine usefulness of a scale – Reliability – necessary but not sufficient – Validity – next step • Validity – is the test measuring what it is meant to measure? • Two important issues – The nature of the what is being measured • Temperature Vs. quality of life/social support (physical vs. abstract) – Relation to the purported cause • Sr. creatinine is a measure of kidney func. because we know it is regulated by the kidneys • But whether students who do volunteer work will become better doctors? • Since our understanding of human behaviour is far from perfect, such predictions have to validated against actual performance 61
  • 50. Types of validity • Three Cs (conventionally) – Content – Criterion • Concurrent • Predictive – Construct – Others (face validity) • New types – Convergent, discriminant, trait etc., 63
  • 51. Differing perspectives • Previously validity was seen as demonstrating the properties of the scale • Current thinking - what inferences can be made about the people that have given rise to the scores on these scales? – Thus validation is a process of hypothesis testing (someone who scores on test A, will do worse in test B, and will differ from people who do better in test C and D) – Researchers are only limited by their imagination to devise experiments to test such hypotheses • All types of validity are addressing the same issue of the degree of confidence we can place in the inferences we can draw from the scales 64
  • 52. • Face validity – On the face of it the tool appears to be measuring what it is supposed to measure – Subjective judgment by one/more experts, rarely by empirical means • Content validity – Measures whether the tool includes all relevant domains or not – Closely related to face validity – aka. ‗validity by assumption‘ because an expert says so • Certain situations where these may not be desired - ? 65
  • 53. Content validity • Example – cardiology exam; – Assume it contains all aspects of the circulatory system (physiology, anatomy, pathology, pharmacology etc., etc.,) – If a person scores high on this test, we can say ‗infer‘ that he knows much about the subject (i.e., our inferences about the person will right across various situations) – In contrast, if the exam did not contain anything about circulation, the inferences we make about a high scorer may be wrong most of the time and vice versa 66
  • 54. • Generally, a measure that includes a more representative sample of the target behaviour will have more content validity and hence lead to more accurate inferences • Reliability places an upper limit on validity (the maximum validity is the square root of reliability coeff.) the higher the reliability the higher the maximum possible validity – One exception is that between internal consistency and validity (better to sacrifice IC to content validity) – The ultimate aim of scale is inferential which depends more on content validity than internal consistency 67
  • 55. Criterion validity • Correlation of a scale to an accepted ‗gold standard‘ • Two types – Concurrent (both the new scale and standard scale are given at the same time) – Predictive – the GS results will be available some time in the future (eg. Entrance test for college admission to assess if a person will graduate or not) • Why develop a new scale when we already have a criterion scale? – Diagnostic utility/substitutability – Predictive utility (no decision can be made on the basis of new scale) • Criterion contamination – If the result of the GS is in part determined in some way by the results of the new test, it may lead to an artificially high correlation 68
  • 56. Construct validity • Height, weight – readily observable • Psychological - anxiety, pain, intelligence are abstract variables and can‘t be directly observed • For eg. Anxiety – we say that a person has anxiety if he has sweaty palms, tachycardia, pacing back and forth, difficulty in concentrating etc., (i.e., we have a hypothesize that these symptoms are the result of anxiety) • Such proposed underlying factors are called hypothetical constructs/ constructs (eg. Anxiety, illness behaviour) • Such constructs arise from larger theories/ clinical observations • Most psychological instruments tap some aspect of construct 69
  • 57. Early morning stiffness 3 or more joints involved esp., small joints Rheumatoid arthritis Elevated ESR, RA factor X rays changes 70
  • 58. Establishing construct validity • IBS is a construct rather than a disease – it is a diagnosis of exclusion • A large vocabulary, wide knowledge and problem solving skills – what is the underlying construct? • Many clinical syndromes are constructs rather than actual entities (schizophrenia, SLE) 71
  • 59. • Initial scales for IBS – ruling out other organic diseases and some physical signs and symptoms – These scales were inadequate because they lead to many missed and wrong diagnoses – New scales developed incorporating demographical features and personality features • Now how to assess the validity of this new scale – Based on my theory, high scorers on this scale should have • Symptoms which will not clear with conventional therapy • Lower prevalence of organic bowel disease on autopsy 72
  • 60. Differences form other types 1. Content and criterion can be established in one or two studies, but there is no single experiment that can prove a construct • Construct validation is an ongoing process, learning more about the construct, making new predictions and then testing them • Each supportive study strengthens the construct but one well designed negative study can question the entire construct 2. We are assessing the theory as well as the measure at the same time 73
  • 61. IBS example • We had predicted that IBS patients will not respond to conventional therapy • Assume that we gave the test to a sample of patients with GI symptoms and treated them with conventional therapy • If high scoring patients responded in the same proportion as low scorers then there are 3 possibilities – Our scale is good but theory wrong – Our theory is good but scale bad – Both scale and theory are bad • We can identify the reason only from further studies 74
  • 62. • If an experimental design is used to test the construct, then in addition to the above possibilities our experiment may be flawed • Ultimately, construct validity doesn‘t differ conceptually from other types of validity – All validity is at its base some form of construct validity… it is the basic meaning of validity – (Guion) 75
  • 63. Establishing construct validity • Extreme groups • Convergent and discriminant validity • Multitrait-multimethod matrix 76
  • 64. Extreme groups • Two groups – as decided by clinicians – One IBS and the other some other GI disease – Equivocal diagnosis eliminated • Two problems – That we are able to separate two extreme groups implies that we already have a tool which meets our needs (however we can do bootstrapping) – This is not sufficient, the real use of a scale is making much finer discriminations. But such studies can be a first step, if the scale fails this it will be probably useless in practical situations 77
  • 65. • Convergent validity - If there are two measures for the same construct, then they should correlate with each other but should not correlate too much. E.g. Index of anxiety and ANS awareness index • Divergent validity – the measure should not correlate with a measure of a different construct, eg. Anxiety index and intelligence index 78
  • 66. Multitrait-multimethod matrix • Two unrelated traits/constructs each measured by two different methods • Eg. Two traits – anxiety, intelligence; two methods – a rater, exam Anxiety Rater Anxiety Intelligence – – – – Exam Intelligence Rater Rater 0.53 Exam 0.42 0.79 Rater 0.18 0.17 0.58 Exam 0.15 0.23 0.49 Exam 0.88 Purple – reliabilities of the four instruments (sh be highest) Blue – homotrait heteromethod corr. (convergent validity) Yellow – heterotrait homomethod corr. (divergent validity) Red – heterotrait heteromethod corr. (sh be lowest) • Very powerful method but very difficult to get such a combination 79
  • 67. Biases in validity assessment • Restriction in range • May be in new scale (MAO level) • May be in criterion (depression score) • A third variable correlated to both (severity) • Eg. A high correlation was found between MAO levels and depression score in community based study, but on replicating the study in hospital the correlation was low 80
  • 68. 81
  • 69. 82
  • 70. Measuring change • Ultimate goal of most treatment studies is to induce a change in the patient‘s status • Controversial views against and for scales which are more sensitive to change in health status • Goals of measuring change – To distinguish between those individuals who change a lot and those who change little – To identify correlates of change – To infer treatment effects from group differences 83
  • 71. • It is easier to demonstrate a consistent change in all the subjects, rather than different amounts of change in different subjects • Why don‘t we measure change directly? – Ask patients how have they changed since they were put on the treatment, because people simply do not remember how they were at the beginning (validity of such response is debatable) – Most defensible way to assess change is to measure it directly at the beginning of the study and subsequently on one or more occasions 84
  • 72. Measures of association • Reliability • Sensitivity to change 85
  • 74. Sensitivity to change from treatment effects 87
  • 75. Item response theory • Limitations of G theory – Subject/population specific – Difficult to compare a person‘s score on two or more different tests (convert to z scores, normality assumption, not always correct) – Homoscedasticity assumption that errors are the same at the ends as in the middle range of scores – Assumption that all items have equal valences • Classical test theory – difficult to separate the properties of the test from the attributes of the people taking it – the tool‘s properties change as the people tested change, the people‘s properties change as the test cahnges 91
  • 76. • IRT – claims to rectify these limitations – Based on two ‗hard assumptions‘ 1. Data are unidimensional (tap only one trait) 2. The probability of answering any item in positive direction is unrelated to the probability of answering any other item positively for people with the same amount of the trait (local independence) – Two postulates 1. Performance of a subject can be predicted by a set of factors called ‗traits‘ or abilities, latent traits (theta) 2. The relationship between a person‘s performance on any item and the underlying trait can be described by an ‘item characteristic curve’ 92
  • 77. • Some important properties of ICC – They are ‗ogives‘, usually – Monotonic; the prob. of answering in a positive direction consistently increases as the score on the trait increases – Differ from each other in three dimensions • Slope • Location along the trait • The flattening out at the bottom – Can be thought of as ‗imperfect‘ Guttman scales 93
  • 78. Item characteristic curve • Q. A is a better discriminator than Q. B • Q. B is harder than Q. A 94
  • 79. Different models of ICC • One parameter model (Rasch model) – Assumes that all items have equal discriminating ability but different difficulty 95
  • 80. • Two parameter model – Assumes that both discriminating ability and difficulty differ 96
  • 81. • Three parameter model – In addition to the two parameter the lower end of the tail asymptotes at some probability greater than 0 – Takes care of the fact that when people answer questions by guessing/ items that are correct by chance 97
  • 82. Deriving the curves • Taking a large number of subjects (200 for one-parameter model, 1000 to estimate the 3 parameter model) • Random sampling Vs. latent trait model – In random sampling – it is not necessary to know much about the items but large pool of items required – Latent trait model – fewer items are required but every item should be known in detail 98
  • 83. Advantages and disadvantages • Allows test-free measurement; people can be compared to one another even if they took different items • Eg. Wide Range Achievement Test • People in different levels can be given different items and yet be placed on the same scale at the end (adaptive/tailored testing) • Not widely used because – Large sample size needed to estimate the parameters – Assumptions are difficult to meet 99
  • 84. Future guidelines for developing health measurements 1. Articles/manuals should give full description of purpose, population, intended use 2. Rationale for design of the instrument – conceptual definition if the object of measurement 3. Describe the ways in which questions were selected 4. Revisions if any should be stated along with reliability and validity 5. Clear instructions for standard administration and scoring 6. Reliability and validity testing should examine both internal structure and its relation to alternative measurements of the concept 7. The tool should be testes by users other than the original authors 100
  • 85. Critical appraisal – Rcq - 36 • • • • • What is the population in this study? What is the type of scale? What is the scaling method used? Have they missed any method for item generation? Is Cronbach‘s alfa calculated appropriately and is the scale reliable? • Is it appropriate to calculate mean (SD) for each domain? • Have they established construct validity in this study? Comment on the MTMM matrix used. • Can this scale be used to measure treatment effects for RC? 101
  • 86. Thank you ―Belief is no substitute for arithmetic‖ — Henry Spencer 102