Weitere ähnliche Inhalte
Ähnlich wie An illustrated guide to the methods of meta analysi
Ähnlich wie An illustrated guide to the methods of meta analysi (20)
Mehr von rsd kol abundjani
Mehr von rsd kol abundjani (20)
An illustrated guide to the methods of meta analysi
- 1. Journal of Evaluation in Clinical Practice, 7, 2, 135–148
An illustrated guide to the methods of meta-analysis
Alexander J. Sutton BSc MSc1 Keith R. Abrams BSc MSc PhD2 and
David R. Jones BA MSc PhD CStat CMath DipTCDHE3
1
Lecturer in Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK
2
Reader in Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK
3
Professor of Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK
Correspondence Abstract
Mr Alex J Sutton
Meta-analysis is now accepted as a necessary tool for the evaluation of
Department of Epidemiology and Public
Health health care. Such analyses have been carried out in virtually every area of
University of Leicester medicine to evaluate a wide spectrum of health care interventions and poli-
22-28 Princess Road West cies. This paper has three broad aims: (1) to describe the basic principles of
Leicester LE1 6TP meta-analysis, using a meta-analysis of interventions intended to reduce
UK
hospital re-admission rates for illustration; (2) to consider threats to the
Keywords: Bayesian methods, internal validity of meta-analysis, and the measures which can be taken to
hospital discharge, meta-analysis, minimize their impact; and (3) to present an overview of more specialist
methods, re-admission, review and developing methods for synthesizing data, with the intention of out-
Accepted for publication:
lining the directions meta-analysis may take in the future.The methods used
22 July 2000 to synthesize studies, which take ‘weighted averages’ of effect sizes have
been refined to a high degree, while the methods for dealing with threats
to the validity of meta-analyses such as publication bias, and variations
in quality of the primary studies, are at a less advanced stage. However,
many consider this standard ‘weighted average’ approach to meta-analysis
not to be ‘state of the art’ in at least some situations, where the use of more
sophisticated methods, generally to explain variation in estimates from
different studies and synthesize a broader base of evidence, would be
advantageous. Currently, approaches which attempt to do this are mainly
still in the experimental stage and, unfortunately, ideas which sound natural
and appealing are often difficult to implement in practice. Clearly, it will be
some time before they are used routinely, but significant steps have been
made.
Since different studies are carried out using different
1 Introduction
populations, different designs and a whole range of
Meta-analysis is now accepted as a necessary tool other study-specific factors, it has been suggested that
for the evaluation of health care. Such analyses have combining them will produce an estimate that has
been carried out in virtually every area of medicine, broader generalizability than any single study. Addi-
to evaluate a wide spectrum of health-care interven- tionally, it may be possible to explain the differences
tions and policies. The primary aim of many meta- between results from individual studies by carrying
analyses is to produce a more accurate estimate of the out a meta-analysis. Such an assessment may even
effect of a particular intervention, or group of inter- provide further insight into the intervention, and
ventions, than is possible using only a single study. develop our understanding of how it works.
© 2001 Blackwell Science 135
- 2. A.J. Sutton et al.
Concurrent with the explosion in the use of meta- tive has produced a checklist addressing the quality
analysis is the continued development and refine- of reporting of meta-analyses (QUORUM) (Moher
ment of the methods used to carry out such analyses. et al. 1999b). This statement is in the same spirit as
This is an important endeavour, because the science the CONSORT statement for reporting randomized
of meta-analysis is still in its infancy, and in the past clinical trials (RCTs) (Begg et al. 1996) and is recom-
over-simplistic methods have led to misleading mend as reading for those preparing reports of meta-
conclusions (Hunt 1997). A systematic review of analyses of RCTs.
methodology for meta-analysis carried out by the
authors (Sutton et al. 1998) informed the writing of
2 The synthesis of estimates of effectiveness
this paper, and is recommended further reading for
from multiple primary studies
more technical details on the material presented
here. The reader should note, however, that several This section focuses on pooling results from a number
important developments which are noted here have of studies investigating the relative effectiveness of an
been published in the short time since the review was intervention. Often, meta-analyses of this sort include
written, confirming the speed with which this field only RCTs, typically with two arms – one arm receiv-
continues to develop. ing experimental treatment and the other control,
This paper has three broad aims: (1) to describe the placebo or standard treatment. (The issue of variable
basic principles of meta-analysis using a worked quality of studies, and the synthesis of studies with
example; (2) to consider the threats to the validity of different designs is considered in sections 3 and 4,
meta-analysis and the measures which can be taken respectively). Data from a meta-analysis of interven-
to minimize their impact; and (3) to present an tions intended to improve the process of hospital dis-
overview of more specialist and developing methods, charge of older people, published elsewhere (Parker
with the intention of outlining the directions meta- et al. 2001), is used to illustrate the methods. Thirty
analysis may take in the future. The term ‘meta- two-arm RCTs are included in the meta-analysis, and
analysis’ is used to describe different aspects of the outcome focused on here is the re-admission rate
research synthesis by different people. In some con- to hospital following discharge. In the remainder of
texts it is used to indicate the whole review process, this section the principal ideas involved in performing
including aspects such as literature searching and a meta-analysis are explained and, where possible,
data extraction, as well as the statistical combination the calculations required are reproduced to aid
of quantitative results. We prefer to use the term ‘sys- understanding. In practice, the use of computer soft-
tematic review’ to indicate the whole review process, ware greatly facilitates the analyses required. The
restricting the term ‘meta-analysis’ to describe the meta-analysis capabilities of many common statistical
synthesis of quantitative data from multiple studies. analysis packages are limited; however, much
Although many recent advances in pre-synthesis specialist software has been developed recently
review methods have been made, such as the devel- (Sutton et al. 2000b; Sterne et al. 2001).
opment of sophisticated searching methods (Sutton
et al. 1998; Dickersin et al. 1994), this paper focuses
Calculation of an effect size for each study
solely on aspects of quantitative data synthesis, or
meta-analysis. [Note: very often a systematic review Broadly speaking, quantitative outcomes from any
will include a meta-analysis; however, if no quantita- study can be classified as belonging to one of three
tive data are available from the primary reports, or data types: (i) binary, e.g. often indicating the pres-
that which is available is deemed too heterogeneous ence or absence of the event of interest in each
to be meaningfully combined, then only a narrative patient; (ii) continuous, where outcome is measured
description of the studies may be carried out (Sutton on a continuous scale, e.g. this could be change in
et al. 1998).] Guidelines for good practice for the pre- blood pressure, etc.; or (iii) ordinal, where outcome
synthesis aspects of systematic reviews have been is measured on an ordered categorical scale, e.g. a
described comprehensively elsewhere (Deeks et al. disease severity scale, where a patient can be classi-
1996; Oxman 1996). Very importantly, a recent initia- fied as belonging to one of several distinct categories.
136 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
- 3. Meta-analysis methods
The approaches used to combine either binary or calculated by dividing the RRs in the treatment and
continuous outcomes are often similar, while ordinal control arms, 0.036/0.162, which produces an RRR of
data is somewhat more complex and requires spe- 0.222. This RRR is less than one, which indicates the
cialist methods, discussed elsewhere (Whitehead & re-admission rate is lower in the treatment arm, sug-
Jones 1994). gesting that the intervention is beneficial. In this
Table 1 provides a sample of the data extracted instance the estimated effect is large (a long way
from reports of 30 RCTs to be included in the meta- from 1). The RRs for each arm are provided in
analysis (for a list of references for these RCTs see columns 5, 8 and the RRR in column 9.
the original report (Parker et al. 2001) – numbers Although the RRR is the measure of interest, due
used to identify these RCTs in this report are pro- to theoretical statistical considerations (including
vided here in the final column of Table 1). Columns improved approximate normality), a natural loga-
three and six provide the number of patients ran- rithm transformation is used (ln(RRR)) for the
domized to the experimental and control arms of purpose of combining studies via a meta-analysis.
each study, respectively. [Note: analysis should (Fleiss 1994) The pooled result can be back-trans-
usually be calculated on the basis of intention to treat formed by taking the exponential of the pooled
(Hollis & Campbell 1999) – if the analysis in the orig- ln(RRR) (e1n(RRR)) afterwards, to convert the answer
inal study report was not performed using this back to the RRR scale, allowing easier interpreta-
method it may still be possible to extract the cor- tion. The ln(RRR) estimates for each study are given
rect figures for the purposes of the meta-analysis.] in column 10 of Table 1.
Columns four and seven indicate the number of re- A further value, the standard error (SE) of the
admission episodes. Note that an individual can have ln(RRR), is required for the meta-analysis calcula-
multiple re-admissions; for example, the new inter- tion. The SE gives an indication of the degree of pre-
vention arm of study 8 included 142 patients, while cision to which each study estimates the effect size; a
554 events were reported. [Note: the fact that more small SE indicates a precise estimate, usually from a
than one re-admission is permitted for each patient large study. The SE for the ln(RRR) is calculated by:
means that an individual’s outcome is not binary.]
Column two indicates the length of follow-up of the SE(ln(RRR)) =
studies, which ranges from 1 to 12 months; it is nec- 1 1
essary to account for follow-up when calculating +
num. of re - admiss. num. of re - admiss. in
effect sizes, since the number of re-admissions may
in exp. group control group
be critically dependent on the length of the observa-
tion period of the trial.
Hence, for study 1 the SE(ln(RRR)) is
An outcome measure which takes into account
÷1/2 + 1/9 = 0.782. Standard errors for the remain-
length of follow-up is the re-admission rate ratio
ing studies are provided in column 11 of Table 1.
(RRR). As the name suggests, this is the ratio of
It is common practice to calculate 95% confi-
the re-admission rates (per month) in both arms.
dence intervals for each study – these indicate
The re-admission rate (RR) in each arm is calculated
the interval in which the estimate of effect size
by:
would be expected to fall 95 times out of every
100 replications of the trial. Hence, a 95% confidence
Number of re - admissions
RR = interval provides a range in which one can be
Number of patients ¥ length of follow - up
reasonably sure the true effect size lies. The formula
for calculating a 95% confidence interval for a
For example, there are two re-admissions in 37
ln(RRR) is:
patients over 1.5 months in trial 1, so the RR is
2/(3.7 ¥ 1.5) = 0.036. [Note: more decimal places are ln(RRR) ± 1.96 ¥ SE(ln(RRR)).
used in the working of the calculations in this paper
than are printed.] Similarly, the RR in the control For study 1 the ln(RRR) 95% confidence interval
group is 0.162. The outcome of interest can now be is given by -1.504 ± 1.96(0.782) = (-3.04 - 0.03). Con-
© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 137
- 4. 138
Table 1 Data and calculations for the hospital re-admissions meta-analysis
Experimental group Control group
Re- Number
Length of Re- Re- Re- Re- admission EPOC used in
Study follow-up Patients admissions admission Patients admissions admission rate ratio SE 95% CI 95% CI Intervention quality original
ID (months) (n) (n) rate (n) (n) rate (RRR) ln(RRR) (ln(RRR)) ln(RRR) RRR Weight administration measure report*
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 1.5 37 2 0.036 37 9 0.162 0.222 -1.504 0.782 (-3.04 - 0.03) (0.05 - 1.03) 1.64 Single 5 53
2 3 464 102 0.073 439 102 0.077 0.946 -0.055 0.140 (-0.33 - 0.22) (0.72 - 1.24) 51.00 Single 3 59
3 6 499 347 0.116 502 340 0.113 1.027 0.026 0.076 (-0.12 - 0.18) (0.88 - 1.19) 171.73 Single 6 60
4 6 86 36 0.070 87 26 0.050 1.401 0.337 0.257 (-0.17 - 0.84) (0.85 - 2.32) 15.10 Single 4 69
5 12 57 9 0.013 56 6 0.009 1.474 0.388 0.527 (-0.65 - 1.42) (0.52 - 4.14) 3.60 Team 3 82
6 2 39 29 0.372 41 35 0.427 0.871 -0.138 0.251 (-0.63 - 0.35) (0.53 - 1.42) 15.86 Single 3 88
7 3 20 3 0.050 20 13 0.217 0.231 -1.466 0.641 (-2.72 to - 0.21) (0.07 - 0.81) 2.44 Single 6 177
8 3 142 554 1.300 140 868 2.067 0.629 -0.463 0.054 (-0.57 to - 0.36) (0.57 - 0.70) 338.16 Team 4 187
9 6 695 343 0.082 701 310 0.074 1.116 0.110 0.078 (-0.04 - 0.26) (0.96 - 1.30) 162.83 Team 6 222
© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
10 2 178 43 0.121 176 37 0.105 1.149 0.139 0.224 (-0.30 - 0.58) (0.74 - 1.78) 19.89 Single 2 228
11 6 30 9 0.050 30 6 0.033 1.500 0.405 0.527 (-0.63 - 1.44) (0.53 - 4.21) 3.60 Team 5 231
12 6 96 42 0.073 97 62 0.107 0.684 -0.379 0.200 (-0.77 - 0.01) (0.46 - 1.01) 25.04 Team 3 236
13 3 303 104 0.114 300 109 0.121 0.945 -0.057 0.137 (-0.33 - 0.21) (0.72 - 1.24) 53.22 Team 4 275
14 6 150 51 0.057 99 32 0.054 1.052 0.051 0.226 (-0.39 - 0.49) (0.68 - 1.64) 19.66 Team 4 283
15 1 20 4 0.200 20 6 0.300 0.667 -0.405 0.645 (-1.67 - 0.86) (0.19 - 2.36) 2.40 Team 1 312
16 1.5 29 4 0.092 25 9 0.240 0.383 -0.959 0.601 (-2.14 - 0.22) (0.12 - 1.24) 2.77 Single 4 334
17 12 333 396 0.099 335 410 0.102 0.972 -0.029 0.070 (-0.17 - 0.11) (0.85 - 1.12) 201.44 Single 3 339
18 3 140 18 0.043 136 16 0.039 1.093 0.089 0.344 (-0.58 - 0.76) (0.56 - 2.14) 8.47 Single 4 351
19 9 418 495 0.132 417 549 0.146 0.899 -0.106 0.062 (-0.23 - 0.02) (0.80 - 1.02) 260.31 Single 3 397
20 6 62 21 0.056 58 35 0.101 0.561 -0.578 0.276 (-1.12 to - 0.04) (0.33 - 0.96) 13.13 Team 4 403
21 12 199 107 0.045 205 111 0.045 0.993 -0.007 0.135 (-0.2 - 0.26) (0.76 - 1.30) 54.48 Team 3 416
22 12 63 22 0.029 60 30 0.042 0.698 -0.359 0.281 (-0.91 - 0.19) (0.40 - 1.21) 12.69 Team 4 691
23 6 35 10 0.048 40 51 0.213 0.224 -1.496 0.346 (-2.17 to - 0.82) (0.11 - 0.44) 8.36 Single 4 1793
24 6 102 49 0.080 102 51 0.083 0.961 -0.040 0.200 (-0.43 - 0.35) (0.65 - 1.42) 24.99 Single 7 1796
25 6 140 24 0.029 97 29 0.050 0.573 -0.556 0.276 (-1.10 to - 0.02) (0.33 - 0.98) 13.13 Team 3.5 2211
26 3 45 5 0.037 46 5 0.036 1.022 0.022 0.632 (-1.22 - 1.26) (0.30 - 3.53) 2.50 Team 6 2229
27 4 49 11 0.056 51 7 0.034 1.636 0.492 0.483 (-0.46 - 1.44) (0.63 - 4.22) 4.28 Single 3.5 2657
28 6 177 49 0.046 186 107 0.096 0.481 -0.731 0.172 (-1.07 to - 0.39) (0.34 - 0.67) 33.61 Single 3 3632
29 3 381 154 0.135 381 197 0.172 0.782 -0.246 0.108 (-0.46 to - 0.04) (0.63 - 0.97) 86.43 Team 4 3636
30 2 96 22 0.019 110 43 0.033 0.586 -0.534 0.262 (-1.05 to - 0.02) (0.35 - 0.98) 14.55 Single 6 4460
*Parker et al. 2000. n = number.
- 5. Meta-analysis methods
Figure 1 Forest plot of 30 RCTs
examining the effect on re-
admission rates of interventions
aimed at modifying the hospital
discharge process for elderly
people.
fidence intervals for RRR are obtained by taking
Combining effect sizes – calculating
the exponential of this ln(RRR) interval; hence,
weighted averages
the RRR 95% confidence interval for study 1 is
(0.05–1.03). This interval includes 1, which indicates The previous section illustrated how a RRR estimate
that on its own the trial is inconclusive, because both and corresponding standard error could be calcu-
beneficial and harmful effect size estimates are lated from summary data extracted from individual
included in the interval and are in some sense plau- study reports. In other instances different effect
sible. This highlights the need to consider the preci- measures may be more appropriate, but the general
sion of the estimate; the study estimated a very large principle that an estimate and SE are required from
treatment effect, but did so very imprecisely; the true each study remains. When outcomes are reported
effect could be much smaller (or larger) than the on a binary scale, the odds ratio, risk ratio or risk
point estimate. The 95% confidence intervals for difference measures are commonly used, while
ln(RRR) and RRR for the remaining studies are outcomes measured on a continuous scale can be
provided in columns 12 and 13, respectively. To aid combined directly, or standardized – if different
examination of the results of the individual studies, scales of measurement have been used in the indi-
these intervals can be plotted on the same axis, as in vidual studies. Descriptions and formulae for each of
Fig. 1. The RRR estimate for each study is plotted, these outcome measures and others are available
with the size of the plotting symbol proportional to elsewhere (Fleiss 1993; Sutton et al. 2000c).
the precision of the estimate. The 95% confidence The simplest way to combine estimates is to
interval for each RRR estimate is also plotted (the average them. Since different studies estimate the
more precise estimates having the smaller confidence true effect size with varying degrees of precision, a
intervals) (other features of this figure will be weighted average is used. The weight given to each
explained in due course). This plot highlights the study in the re-admissions meta-analysis is calculated
variability in the estimates and in the precisions by:
between studies. The issue of variability between esti-
mates from individual studies is considered further in 1
weight = 2
.
later sections. SE(ln(RRR))
© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 139
- 6. A.J. Sutton et al.
The square of the standard error is often known as treatment effect. Many people feel that in medical
the variance, so combining studies using this weight- and related research such an assumption is unrealis-
ing is often called the inverse variance-weighted tic (Thompson 1993) because studies are never iden-
method (Fleiss 1993). The weightings for each study tical replications of one another, and study design
are provided in Table 1, column 14. If an effect and conduct differences will inevitably have some
measure other than the RRR is being used, then the degree of influence on study outcome. Models which
weightings are calculated by the same principle, using account for underlying variability in the treatment
the inverse of the variance of that effect measure. effect estimates are considered in the next section.
Once weight for each study has been calculated, a
pooled estimate of ln(RRR) is calculated by multi-
Heterogeneity and random effect models
plying each study’s weight by its ln(RRR) and
summing the resulting values, and then dividing this When performing a meta-analysis, although the
value by the sum of the weights. Using figures from overall aim may be to produce an overall pooled esti-
Table 1, the outline calculation for the re-admissions mate of treatment effect, it is crucial to assess the
data is: variation between results of the primary studies and,
if possible, to investigate why they differ. Clearly, it
ln( pooled RRR) would be remarkable if all studies being meta-
[(1.64 ¥ (-1.504)) + . . . + (14.55 ¥ (-1.504))] analysed produced exactly the same treatment effect
=
(1.654 + . . . + 14.55) estimate. Some variation in results is expected, due
= -0.164 simply to the play of chance; this is often called
random variation. However, if effect size estimates
The variance for ln(pooled RRR) (or any other
vary between studies to a greater extent than
effect measure used) is then calculated by taking the
expected on the basis of chance alone the studies are
reciprocal of the sum of the weights (1/sum of
considered to be heterogeneous, and it is necessary
weights):
to account for the extra variation, above that ex-
var ( pooled RRR) = 1 (1.64 + . . . + 14.55) pected by chance, in the meta-analysis model. The
= 0.0006 way this is usually performed is through the use of a
random-effect model. Essentially, this relaxes the
Using these figures, an approximate 95% confidence assumption that each study is estimating exactly the
interval for the pooled estimate can be calculated same underlying treatment effect, and instead
in the same manner as confidence intervals were assumes that the underlying effect sizes are drawn
produced for the individual study estimates above. from a distribution of effect sizes. This distribution is
The pooled estimate of RRR for the re-admissions usually assumed to be Normal, with a variance deter-
dataset is 0.85 with 95% CI (0.81–0.89), indicating a mined by the data. In practical terms, accounting for
modest, statistically significant treatment benefit at between study heterogeneity in this way produces a
the 5% level.This estimate is plotted using a diamond pooled point estimate which is often (but not always)
shape in Fig. 1 directly below the 30 individual similar to the one produced by fixed-effect methods.
studies. Figure 1 is often called a forest plot and is However, taking into account between study hetero-
commonly used to display the results of a meta- geneity produces a wider 95% confidence interval, so
analysis. the estimate is more conservative.
This approach is often known as a fixed-effect The whole issue of appropriateness and suitability
approach, to distinguish it from the random-effect of fixed- and random-effect models for meta-analysis
models described below. It can be used to combine has been much discussed (Thompson 1993; Peto
outcomes on any scale; however, other related fixed- 1987). A test for heterogeneity exists (Fleiss 1993),
effect methods specifically for combining odds ratios and the result of this test can then be used to inform
also exist (Fleiss 1993; Sutton et al. 2000c). These model choice. If it is non-significant a fixed-effect
fixed-effect methods all make the strong assumption model is to be used, and if it is significant a random-
that each study is estimating the same underlying effect model should be used. This seemingly sensible
140 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
- 7. Meta-analysis methods
approach has a flaw because the test has low power. desirable than using random-effect models to allow
This implies that heterogeneity may exist even when for heterogeneity is to try to explain the heterogene-
it produces a non-significant result (Boissel et al. ity. This may lead to the identification of associations
1989). An alternative approach is to always use a between study or patient characteristics and the
random-effect model. The inflation of the confidence outcome measure, which would not have been pos-
interval is dictated by the degree of variation sible in single studies. This may lead in turn to clini-
between studies, so when between-study variation is cally important findings and may eventually assist in
small the inflation will be negligible, producing a individualizing treatment regimes (Lau et al. 1998).
result which would be very similar to the fixed-effect Both subgroup analyses and regression methods can
approach. be used to do this.
A detailed description of the random-effect meta- Potential study level factors, pertaining to either
analysis model is beyond the scope of this paper, but study design or patient characteristics which could
clear accounts are given elsewhere (DerSimonian & affect study results should ideally be identified before
Laird 1986; Shadish & Haddock 1994). Combining a meta-analysis is conducted. If this is carried out,
the 30 studies evaluating interventions to prevent re- data on these factors can then be obtained at the data
admission using a random-effect model produces a extraction stage of a review, and such explicit a priori
RRR of 0.83 (0.73–0.93). This estimate is plotted specification also reduces the temptation of ‘data
below the fixed-effect one in Fig. 1. The estimate of dredging’.
the between-study variance is 0.057, which is quite Returning to the re-admission dataset, one poten-
small but non-negligible (the test for between- tial factor which could affect results is whether the
study heterogeneity is highly significant (P < 0.001)). intervention was administered by a team or an
Accounting for this heterogeneity has produced a individual. This information is given for each study
wider confidence interval compared to the fixed- in column 15 of Table 1. In 16 of the studies the
effect approach, which is a typical finding. Modifica- intervention was administered by an individual
tions to the way the parameters in a random-effect and in 14 it was administered by a team. Separate
meta-analysis model are calculated have been devel- meta-analyses can be performed for these two sub-
oped (Hardy & Thompson 1996; Biggerstaff & groups in an attempt to see if the effectiveness of
Tweedie 1997). One of these should be used if the the intervention depends on whether an individual
number of studies in the meta-analysis is small or team implements it, and whether between study
(approximately less than 10) as it overcomes prob- heterogeneity is reduced in the subgroups. Pooled
lems with a previous simplification in the model cal- estimates for these subgroups turn out to be almost
culations, which can be important in meta-analyses of identical. The intervention administered by indi-
small numbers of studies. vidual subgroup has a RRR of 0.83 (0.70–0.97) and
A final point concerning between study hetero- the estimate of the between-study heterogeneity
geneity is that there is little explicit guidance to offer of 0.056 (test for heterogeneity highly significant at
regarding the point at which studies estimates should P < 0.001). For the studies where the intervention
not be pooled at all because heterogeneity is deemed was administered by a team the RRR was 0.83
too great, but alternative approaches are discussed (0.69–0.99) and the estimate of between-study
below. heterogeneity 0.062 (test for heterogeneity highly
significant at P < 0.001). Hence, it would appear
that whether the intervention is administered by an
individual or a team makes very little difference to
Exploring and explaining heterogeneity
the effectiveness of the intervention and, hence, does
Until now, the impression has been given that het- not explain any of the variation between study
erogeneity is a nuisance factor which needs account- results.
ing for when performing a meta-analysis. However, If the factor of interest is measured on a continu-
investigating why between-study variation exists ous scale, or dummy indicator variables are created
offers the meta-analyst unique opportunities. More for the levels of categorical factors, then meta-
© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 141
- 8. A.J. Sutton et al.
regression can be used to explore their impact. Meta-
Publication and related biases
regression models are very similar in principle to
ordinary simple linear regression models, the main Publication bias exists because research with statisti-
differences being that individual observations (the cally significant or interesting results is potentially
primary studies), unlike individual patients, are not more likely to be submitted, published or published
given equal weight in the analysis (i.e. study should more rapidly than work with null or non-significant
be weighted according to its precision). Addition- results (Song et al. 2000). When only the published
ally, it may be desirable to include a random-effect literature is included in a meta-analysis, this can
term to account for residual heterogeneity not potentially lead to biased over-optimistic conclu-
explained by the covariate(s); such a model can be sions. Related biases which can also bias the results
thought of as an extension to the random-effect of a meta-analysis include (i) pipeline bias, when sig-
model described above (Berkey et al. 1995). An nificant results are published quicker than non-sig-
example of a meta-regression analysis is given in nificant ones; and (ii) language bias, when researchers
section 3. whose native tongue is not English are more likely to
Meta-regression techniques are currently used publish their non-significant results in non-English
relatively rarely, and the authors believe not to written journals, but are more likely to publish their
their full potential, but examples are emerging significant results in English. If this happens, a meta-
(Freemantle et al. 1999; von Dadelszen et al. analysis including only study reports in English may
2000). Although a powerful tool, they do have their be based on a biased collection of studies. Perhaps an
limitations. Regression analysis of this type are also appropriate term which includes all these sources of
susceptible to aggregation bias, which occurs if the bias is ‘dissemination bias’ (Song et al. 2000).
relation between patient characteristic study means Long-term initiatives to alleviate the problem of
and outcomes do not directly reflect the relation publication bias have commenced, including trial
between individuals’ values and individuals’ out- amnesties (Horton 1997) to encourage publication of
comes (Greenland 1987). Additionally, meta- previously unpublished trials, and the creation of reg-
regression type analyses are often limited by the istries for prospective registration of trials (Horton
number of studies included in the meta-analysis. & Smith 1999). However, the issue is currently still
Special regression models have also been developed a big concern for researchers carrying out meta-
to explore the effect of patients’ underlying risk on analyses. There are certain measures which can be
intervention effect (Senn et al. 1996; Walter 1997) taken to assess the presence and minimize the impact
which are necessary to avoid producing incorrect of publication bias in a meta-analysis dataset. Cur-
results when exploring the effect of such a factor rently, however, there is much debate, and some
(Schmid et al. 1998; Senn et al. 1996). dispute as to the approach researchers should take to
deal with publication bias in meta-analyses.
The presence of publication bias in a meta-
analysis dataset can be assessed informally by inspec-
3 Threats to the validity of a meta-analysis
tion of a funnel plot (Light & Pillemar 1984). This
Although meta-analyses are often considered to plots the effect size for each study against some
provide the highest grade of evidence available measure of its precision, e.g. the 1/standard error of
regarding the effectiveness of an intervention, higher the effect size. The resulting plot should be shaped
than an individual trial, it should not be forgotten like a funnel if no publication bias is present. This
that they are a type of observational study, and as shape is expected because trials of decreasing size
such are open to biases which may threaten their have increasingly large variation in their effect
validity. Perhaps the two most serious problems size estimates due to random variation becoming
which can potentially lead to biased estimates are increasingly influential. However, if the chance of
publication bias and variable study quality of the publication is greater for larger trials or trials with
primary studies. These two issues are considered statistically significant results, some small non-
further below. significant studies may not appear in the literature.
142 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
- 9. Meta-analysis methods
20 part of a sensitivity analysis is sensible (Sutton et al.
2000a) but more research is needed in this area.
1/standard error (In(RRR))
16
12 Study quality
8 It is rare that all the studies available for a meta-
analysis are of a unanimously high quality. More
4 likely there will be a range in the quality of the
research pertaining to the intervention of interest.
0 Restricting a meta-analysis to include only RCTs is
0.25 0.5 0.75 1.0 1.5 2.0 a safeguard taken by such groups as the Cochrane
Re-admission rate ratio (log scale) Collaboration, in an attempt to include only evidence
which potentially produces the least biased results.
Figure 2 Funnel plot of studies included in the hospital
Restricting analyses only to RCT does not guarantee
discharge meta-analysis examining the effect of
interventions on the effect of re-admission rates.
the meta-analysis will produce an unbiased result,
however, as there can still be methodological flaws
in the design, conduct and analysis of a trial. Clearly,
This leads to omission of trials in one corner of the the inclusion of poor or flawed studies in a meta-
plot – the bottom right-hand corner of the plot when analysis may be problematic because their influence
an ‘undesirable’ outcome such as the re-admission may bias the pooled result and even mean the meta-
rate is being considered, and hence to a degree of analysis cones to the wrong qualitative conclusions.
asymmetry in the funnel. A funnel plot for the 30 Unfortunately, most studies are flawed to some
RCTs in the re-admissions dataset is provided in degree, and including all but ‘perfect’ studies (which
Fig. 2. Visual inspection would suggest that there is may not be possible to conduct due to ethical or prac-
little evidence of publication bias in this dataset; tical constraints in some fields) may leave the meta-
however, there are a few small studies with extremely analyst with few if any data. The problem of dealing
beneficial RRRs at the bottom left-hand corner of with study quality in a meta-analysis is similar to that
the plot, for which there are no symmetric counter- for publication bias, in the sense that there is agree-
parts with extreme positive RRRs in the bottom ment that some assessment of quality should always
right-hand corner. be made, but little consensus on how to make such
Publication bias can be tested for more formally an assessment, or how to incorporate the results into
using statistical tests which are based on the same the meta-analysis.
symmetry assumptions as a funnel plot assessment There have been many scales and checklists devel-
(Begg & Mazumdar 1994; Egger et al. 1997; Duval & oped to aid in the assessment of study quality (Moher
Tweedie 1998). One formal test (Egger et al. 1997) et al. 1995) but many of them have come under heavy
produces a non-significant P-value of 0.57 for the re- criticism for not being constructed scientifically
admissions dataset, which is consistent with the (Moher et al. 1999a). Further, recent work has de-
inconclusive visual assessment. monstrated that different results can be obtained in
Disagreement exists about how to proceed if pub- a meta-analysis depending on the checklist used (Juni
lication bias is suspected, after an assessment for its et al. 1999). A further problem is the fact that it is
presence has been made. Methods to assess the likely often difficult to ascertain all the required details of
impact of publication bias on the pooled outcome the trial from a study report (Begg et al. 1996). Often,
estimate have been developed (Duval & Tweedie this means that an assessment of the trial report and
1998; Givens et al. 1997; Copas 1999; Song et al. 2000) not of the trial itself is in effect being made. The
but they are not widely used, due partly to the fact underlying problem with the use of a scale or check-
that many are complex and hence difficult to imple- list is that it is impossible to predict which design
ment, and due partly to concerns about their applic- aspects cause the most bias and, more fundamentally,
ability. We believe that the use of such methods as it is often impossible to predict even the direction in
© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 143
- 10. A.J. Sutton et al.
which any bias will be acting (Schulz et al. 1995). This 2.0
Re-admission rate ratio (log scale)
makes the direct adjustment of study estimates for
study quality impossible.
Several ways in which study quality can be incor- 1.0
porated into a meta-analysis have been suggested.
Perhaps the simplest is to use a quality threshold to
include or exclude studies. This could be defined 0.5
using a cut-off value, on a particular quality scale, or
as a requirement of having several design aspects
present. A further possibility is to use a quality score
to weight study results, or incorporate such a score 0.2
into the standard precision weightings (Berard & 1 2 3 4 5 6 7
Bravo 1998). Finally, an approach which appears to EPOC quality score
be gaining support is the exploration of quality, via
Figure 3 Regression line examining the impact of
meta-regression. In such an approach a quality score,
quality score, using a random effect meta-regression
or individual markers of study quality, such as the model for re-admission rate in the hospital discharge
degree of blinding or method of treatment allocation, meta-analysis.
are included in a regression model as explanatory
variables. Examining individual markers of quality
separately eliminates the problems with the some- further developments have been made. A proportion
what arbitrary construction for the quality scale of these focus on the synthesis of less standard data
scoring systems (Detsky et al. 1992). types. For example, specialist methods are required
Returning to the re-admissions meta-analysis, to pool the results of diagnostic tests because two
study quality was rated crudely using a count of effec- outcomes, specificity and sensitivity, require simulta-
tive practice and organization of care (EPOC) neous consideration (Irwig et al. 1995). Another area
quality criteria (Cochrane Effective Practice & Orga- which requires special methods is the analysis of sur-
nization of Care Review Group 1998) that were vival data because account has to be made of cen-
satisfied for each study. The scores obtained by each sored observations. (Dear 1994) Other data-types for
trial using this method are given in the penultimate which specialist methods have been developed are
column of Table 1. When these scores are included in dose–response data (Tweedie & Mengersen 1995)
a random effect regression model, the equation and economic data (Jefferson et al. 1996). Individual
ln(RRR) = -0.22 + 0.007 ¥ quality score is obtained. patient data (Stewart & Clarke 1995), where original
This regression line, together with the primary study datasets are pooled, rather than relying on pub-
studies (the size of the plotting symbol is propor- lished summary data has been described as the gold
tional to the precision of the effect size estimate), are standard, it is considered by some to be the only way
plotted in Fig. 3. The quality score coefficient is small to carry out a meta-analysis of survival data, and is
and not statistically significant (P = 0.88). This means much more time consuming and costly than meta-
study quality, at least as measured in this way, would analysis of summary data. It is currently unclear
not appear to affect the study results systematically, whether the extra effort required is worthwhile. For
or to explain the between-study heterogeneity. an overview of these and further meta-analytical
developments see Sutton et al. (2000c).
4 Further developments in methods
of meta-analysis New directions for meta-analysis using
Bayesian statistics
Specialist meta-analysis methods
In addition to the above developments, more
While section 2 provided a summary of the most advanced methods for synthesis of information have
commonly used methods in meta-analysis, many been developed. Although not currently used rou-
144 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
- 11. Meta-analysis methods
tinely, these provide potentially more powerful and ment many existing meta-analysis methods devel-
flexible tools for synthesizing evidence. Many of oped classically and, more importantly, develop
these methods use Bayesian statistics, in contrast to models not possible using more traditional classical
the more commonly employed classical approach. software. This has potentially huge benefits for syn-
A full description of Bayesian methods is not pos- thesizing information and builds on earlier pioneer-
sible here, but for a recent review of their use in ing work by Eddy et al. (1992), whose ‘new’ graphical
assessing health technologies see Spiegelhalter et al. approach to meta-analysis can now be implemented
(2000a). The key element of the Bayesian approach using WinBUGS (Spiegelhalter et al. 2000b). Issues
is that it introduces the idea of subjective probability being addressed by these methods are outlined
(O’Hagan 1988) in contrast to the objective pro- below:
babilities traditionally attached to specific, often 1 Data from an RCT may be of direct interest, but
repeatable, events. Before carrying out a piece of not of a form which can simply included in a meta-
research, an investigator would have formed some analysis. For example, data from an RCT which uses
prior beliefs regarding its outcome, possibly derived the intervention of interest in the treatment arm, but
from results of previous research in the same field. a different intervention from the other studies in
These a priori beliefs are combined with the data the control arm may be available. Methods to
from the current investigation to produce results include such data have been developed (Higgins &
which reflect the researchers beliefs having con- Whitehead 1996).
ducted the research. These posterior beliefs are cal- 2 In some assessments considering only random-
culated by combining the prior beliefs with the new ized evidence may not be the optimal approach.
data using Bayes’ Theorem, which forms the back- Observational studies, which could potentially be
bone of all Bayesian analysis. very large, providing valuable data on thousands of
The advantages of using such an approach are patients, may be available. It may sometimes seem
often subtle, but important. Perhaps most notable unjust to exclude these from a meta-analysis, partic-
from a health-care context is the ability to make ularly if they are of high quality, as they may have
direct probability statements regarding quantities of particular strengths and weaknesses, different from
interest, for example, the probability that patients those of randomized studies (Droitcour et al. 1993).
receiving drug A have better survival than those Special methods have been developed to account for
who receive drug B. There are good reasons, different study designs in a meta-analysis (Prevost
however, why the Bayesian approach has largely et al. 2000; Larose & Dey 1997). In other instances
been neglected in routine use. The most serious data on the effect of a drug of interest in animals may
is that, generally, the computations required in be available and provide valuable information which
Bayesian models are very complex. Additionally, the can be incorporated (DuMouchel & Harris 1983).
expressing of prior beliefs in form which can be 3 There may be benefits to including information
included in analysis is a non-trivial task. Excitingly, included in previous trials or meta-analyses on
many of the computational difficulties have been similar topics using similar interventions and out-
addressed recently, with the development of special- come measures (Higgins & Whitehead 1996).
ist software, most notably WinBUGS (Spiegelhalter 4 A study may not provide any quantitative data at
et al. 2000b). The problem of expressing prior beliefs all, being qualitative in design, but this qualitative
remains; however, there are practical ‘solutions’, data may be of direct relevance to the topic under
including using ‘off-the-shelf’ priors, which can assessment (Roberts et al. 1998).
express the presence of a range of degrees of prior Bayesian modelling gives us the potential to
knowledge, and can be used in a sensitivity analysis. include all these types of data in a variety of ways,
Use of ‘vague priors’, which essentially means prior including direct input into the model, or incorporated
information is ignored, is also possible. through the specification of prior beliefs.
The new WinBUGS software is able to compute Other new approaches to meta-analysis have been
the calculations required for a wide range of suggested, but the corresponding methodology is at
Bayesian analyses. The user has freedom to imple- the conceptual rather than practical stage of devel-
© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 145
- 12. A.J. Sutton et al.
opment. The extension of meta-regression to the ideas which sound natural and appealing are often
simultaneous modelling of multiple scientific factors difficult to implement in practice. Clearly, it will be
with the intention of producing a response surface of some time before they are used routinely, but signif-
treatment effects, rather than a single pooled result icant steps have been made. Moving the synthesis of
has been advocated (Rubin 1992). This may allow a evidence beyond calculating simple averages is
more detailed examination of the science underlying timely, feasible and, indeed, essential.
the results synthesized (Rubin 1992; Lau et al. 1998).
Further, it may be possible to model different aspects Acknowledgements
of the processes under study separately. For example,
if one were interested in the effect of lowering cho- The research on which this paper is based was
lesterol of clinical outcomes, in a first stage of the funded, in part, by the NHS Research and Develop-
analysis data relating to the degree different inter- ment Health Technology Assessment Programme
ventions lower cholesterol levels could be synthe- (Methodology Project Numbers 93/52/3 & 95/09/03)
sized. Then, in a second stage, the relationship
between cholesterol level and various clinical out- References
comes could be modelled (Katerndahl & Lawler
Begg C., Cho M., Eastwood S., Horton R., Moher O. &
1999). A further utilization of Bayesian modelling
Olkin I. (1996) Improving the quality of reporting of
could allow meta-analysis to be placed within a deci-
randomised controlled trials: the CONSORT statement.
sion theoretical framework (Berger 1980) which can Journal of the American Medical Association 276,
also take into account utilities when making health 637–639.
care or policy decisions (Midgette et al. 1994). Begg C.B. & Mazumdar M. (1994) Operating characteris-
However, there is no magic wand to make all this tics of a rank correlation test for publication bias. Bio-
happen. While Bayesian modelling provides flexibil- metrics 50, 1088–1101.
ity and framework, it does not dictate how models Berard A. & Bravo G. (1998) Combining studies using
should be specified, data should be incorporated, or effect sizes and quality scores: application to bone loss
how priors should be elicited. There is much method- in postmenopausal women. Journal of Clinical Epidemi-
ological work required to further develop the ideas ology 51, 801–807.
outlined above. Berger J.O. (1980). Statistical Decision Theory and Bayesian
Analysis, 2nd edn. Springer-Verlag, New York.
Berkey C.S., Hoaglin D.C., Mosteller F. & Colditz G.A.
5 Conclusion (1995) A random-effects regression model for meta-
analysis. Statistics in Medicine 14, 395–411.
Much has been written on meta-analysis and the syn- Biggerstaff B.J. & Tweedie R.L. (1997) Incorporating vari-
thesis of evidence within the medical literature over ability in estimates of heterogeneity in the random
the past two decades. During this time, the basic syn- effects model in meta-analysis. Statistics in Medicine 16,
thesizing of effect measures using weighed averages 753–768.
has been refined to a high degree, and much of the Boissel J.P., Blanchard J., Panak E., Peyrieux J.C., SACKS
methodology required to do so is in place for most & H. (1989) Considerations for the meta-analysis of ran-
situations encountered. Threats to the validity of domized clinical trials: summary of a panel discussion.
meta-analysis exist, and the methods for dealing with Controlled Clinical Trials 10, 254–281.
problems such as publication bias and variations in Cochrane Effective Practice and Organisation of Care
Review Group (1998) The Data Collection Checklist.
quality of the primary studies are at a less refined
University of Aberdeen, HSRU, Aberdeen.
stage. Additionally, many consider the standard
Copas J. (1999) What works?: selectivity models and meta-
‘weighted average approach’ to meta-analysis not to analysis. Journal of the Royal Statistical Society, Series A
be ‘state of the art’ in at least some situations, where 161, 95–105.
the use of more sophisticated methods, generally to von Dadelszen P., Ornstein M.P., Bull S.B., Logan A.G.,
synthesize a broader base of evidence, would be Koren G. & Magee L.A. (2000) Fall in mean arterial
advantageous. Currently, such approaches are still pressure and fetal growth restriction in pregnancy hyper-
firmly in the experimental stage and unfortunately tension: a meta-analysis. Lancet 355, 87–92.
146 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
- 13. Meta-analysis methods
Dear K.B.G. (1994) Iterative generalized least squares for epidemiological literature. Epidemiological Review 9,
meta-analysis of survival data at multiple times. Biomet- 1–30.
rics 50, 989–1002. Hardy R.J. & Thompson S.G. (1996) A likelihood approach
Deeks J., Glanville J. & Sheldon T. (1996) Undertaking sys- to meta-analysis with random effects. Statistics in Medi-
tematic reviews of research on effectiveness: CRD cine 15, 619–629.
guidelines for those carrying out or commissioning Higgins J.P.T. & Whitehead A. (1996) Borrowing strength
reviews. Report no. 4. Centre for Reviews and Dissemi- from external trials in a meta-analysis. Statistics in Med-
nation. York Publishing Services Ltd, York. icine 15, 2733–2749.
DerSimonian R. & Laird N. (1986) Meta-analysis in clini- Hollis S. & Campbell F. (1999) What is meant by intention
cal trials. Controlled Clinical Trials 7, 177–188. to treat analysis? Survey of published randomised con-
Detsky A.S., Naylor C.D.O., Rourke K., McGeer A.J.L., trolled trials. British Medical Journal 319, 670–674.
Abbe K.A., O’Rourke K. & L’Abbe K.A. (1992) Incor- Horton R. (1997) Medical editors trial amnesty. Lancet 350,
porating variations in the quality of individual random- 756.
ized trials into meta-analysis. Journal of Clinical Horton R. & Smith R. (1999) Time to register randomised
Epidemiology 45, 255–265. trials – the case is now unanswerable. British Medical
Dickersin K., Scherer R. & Lefebvre C. (1994) Systematic Journal 319, 865–866.
reviews – identifying relevant studies for systematic Hunt M. (1997) How Science Takes Stock: the story of meta-
reviews. British Medical Journal 309, 1286–1291. analysis. Russell Sage Foundation, New York.
Droitcour J., Silberman G. & Chelimsky E. (1993) Irwig L., Macaskill P., Glasziou P. & Fahey M. (1995) Meta-
Cross-design synthesis: a new form of meta-analysis analytic methods for diagnostic test accuracy. Journal of
for combining results from randomized clinical trials Clinical Epidemiology 48, 119–130.
and medical-practice databases. International Journal Jefferson T., Mugford M., Gray A. & DeMicheli V. (1996)
of Technology Assessment in Health Care 9, 440– An exercise in the feasibility of carrying out secondary
449. economic analysis. Health Economics 5, 155–165.
DuMouchel W.H. & Harris J.E. (1983) Bayes methods for Juni P., Witschi A., Bloch R. & Egger M. (1999) The hazards
combining the results of cancer studies in humans and of scoring the quality of clinical trials for meta-analysis.
other species (with comment). Journal of the American Journal of the American Medical Association 282,
Statistical Association 78, 293–308. 1054–1060.
Duval S. & Tweedie R. (1998) Practical estimates of the Katerndahl D.A. & Lawler W.R. (1999) Variability in meta-
effect of publication bias in meta-analysis. Australasian analytic results concerning the value of cholesterol re-
Epidemiologist 5, 14–17. duction in coronary heart disease: a meta-meta-analysis.
Eddy D.M., Hasselblad V. & Shachter R. (1992) Meta- American Journal of Epidemiology 149, 429–441.
Analysis by the Confidence Profile Method. Academic Larose D.T. & Dey D.K. (1997) Grouped random effects
Press, San Diego. models for Bayesian meta-analysis. Statistics in Medicine
Egger M., Smith G.D., Schneider M. & Minder C. (1997) 16, 1817–1829.
Bias in meta-analysis detected by a simple, graphical test. Lau J., Ioannidis J.P. & Schmid C.H. (1998) Summing up
British Medical Journal 315, 629–634. evidence: one answer is not always enough. Lancet 351,
Fleiss J.L. (1993) The statistical basis of meta-analysis. Sta- 123–127.
tistical Methods in Medical Research 2, 121–145. Light R.J. & Pillemar D.B. (1984) Summing Up: the science
Fleiss J.L. (1994) Measures of effect size for categorical of reviewing research. Harvard University Press, Cam-
data. In The Handbook of Research Synthesis (eds H. bridge, MA.
Cooper & L.V. Hedges), pp. 245–260. Russell Sage Foun- Midgette A.S., Wong J.B., Beshansky J.R., Porath A.,
dation, New York. Fleming C. & Pauker S.G. (1994) Cost-effectiveness of
Freemantle N., Cleland J., Young P., Mason J. & Harrison streptokinase for acute myocardial-infarction – a com-
J. (1999) b-Blockade after myocardial infarction: sys- bined metaanalysis and decision-analysis of the effects
tematic review and meta regression analysis. British of infarct location and of likelihood of infarction.
Medical Journal 318, 1730–1737. Medical Decision Making 14, 108–117.
Givens G.H., Smith D.D. & Tweedie R.L. (1997) Publica- Moher D., Cook D.J., Eastwood S., Olkin I., Rennie D. &
tion bias in meta-analysis: a Bayesian data-augmentation Stroup D. for the QUORUM Group (1999b) Improving
approach to account for issues exemplified in the passive the quality of reporting of meta-analysis of randomised
smoking debate. Statistical Science 12, 221–250. controlled trials: the QUORUM statement. Lancet 354,
Greenland S. (1987) Quantitative methods in the review of 1896–1900.
© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 147
- 14. A.J. Sutton et al.
Moher D., Jadad A.R., Nichol G., Penman M., Tugwell P. & Song F., Easterwood A., Gilbody S., Duley L. & Sutton
Walsh S. (1995) Assessing the quality of randomized con- A.J. (2000) Publication and other selection biases in
trolled trials – an annotated bibliography of scales and systematic reviews. Health Technology Assessment 4(10),
checklists. Controlled Clinical Trials 12, 62–73. 1–115.
Moher D., Klassen T.P., Jadad A.R., Tugwell P., Moher M. & Spiegelhalter D.J., Miles J.P., Jones D.R. & Abrams K.R.
Jones A.L. (1999a) Assessing the quality of randomised (2000a) Bayesian methods in health technology assess-
controlled trials: implications for the conduct of meta- ment. Health Technology Assessment 4(38), 1–142.
analyses. Health Technology Assessment 3(12), 1–98. Spiegelhalter D.J., Thomas A. & Best N.G. (2000b)
O’Hagan A. (1988) Probability: methods and measurement. Winbugs, version 1.2. user manual. MRC Biostatistics
Chapman & Hall, London. Unit, Cambridge.
Oxman A.D. (1996) The Cochrane Collaboration Hand- Sterne J.A.C., Egger M. & Sutton A.J. (2001) Meta-
book: preparing and maintaining systematic reviews, 2nd analysis software. In Systematic Reviews in Health Care:
edn. Cochrane Collaboration, Oxford. meta-analysis in context, 2nd edn (eds M. Egger, G.
Parker S.G., Peet S.M., McPherson A., Cannaby A.M., Davey Smith & D.G. Altman), pp. 336–346. BMJ Books,
Baker R.,Wilson A., Lindesay J., Parker G.,Abrams K.R. London.
& Jones D.R. (2001) A systematic review of discharge Stewart L.A. & Clarke M.J. (1995) Practical methodology
arrangements for older people. Health Technology of meta-analyses (overviews) using updated individual
Assessment (in press). patient data. Cochrane Working Group on Statistical
Peto R. (1987) Why do we need systematic overviews of Medicine 14, 2057–2079.
randomised trials? Statistics in Medicine 6, 233–240. Sutton A.J., Abrams K.R., Jones D.R., Sheldon T.A. & Song
Prevost T.C., Abrans K.R. & Jones D.R. (2000) Hierarchi- F. (1998) Systematic reviews of trials and other studies.
cal models in generalized synthesis of evidence: an Health Technology Assessment 2(19), 1–310.
example based on studies of breast cancer screening. Sutton A.J., Abrams K.R., Jones D.R., Sheldon T.A. &
Statistics in Medicine 19, 3359–3376. Song F. (2000c) Methods for Meta-Analysis in Medical
Roberts K.A., Jones D.R., Abrams K.R., Dixon-Woods M. Research. John Wiley, London.
& Fitzpatrick R. (1998) Meta-analysis of qualitative and Sutton A.J., Duval S.J., Tweedie R.L., Abrams K.R. & Jones
quantitative evidence: an example based on studies of D.R. (2000a) Empirical assessment of effect of publica-
patient satisfaction. Technical Report 98–01, University tion bias on meta-analyses. British Medical Journal 320,
of Leicester: Department of Epidemiology and Public 1574–1577.
Health, Leicester. Sutton A.J., Lambert P.C., Hellmich M., Abrams K.R. &
Rubin D. (1992) A new perspective. In The Future of Meta- Jones D.R. (2000b) Meta-analysis in practice: a critical
Analysis (eds K.W. Wachter & M.L. Straf), pp. 155–165. review of available software. In Meta-Analysis in Medi-
Russell Sage Foundation, New York. cine and Health Policy (eds D.A. Berry & D.K. Stangl).
Schmid C.H., Lau J., McIntosh M.W. & Cappelleri J.C. Marcel Dekker, New York.
(1998) An empirical study of the effect of the control Thompson S.G. (1993) Controversies in meta-analysis: the
rate as a predictor of treatment efficacy in meta-analysis case of the trials of serum cholesterol reduction. Statisti-
of clinical trials. Statistics in Medicine 17, 1923–1942. cal Methods in Medical Research 2, 173–192.
Schulz K.F., Chalmers I., Hayes R.J. & Altman D.G. (1995) Tweedie R.L. & Mengersen K.L. (1995) Meta-analytic
Empirical evidence of bias: dimensions of methodologi- approaches to dose–response relationships, with appli-
cal quality associated with estimates of treatment effects cation in studies of lung cancer and exposure to envi-
in controlled trials. Journal of the American Medical ronmental tobacco smoke. Statistics in Medicine 14,
Association 273, 408–412. 545–569.
Senn S., Sharp S., Thompson S. & Altman D. (1996) Rela- Walter S.D. (1997) Variation in baseline risk as an expla-
tion between treatment benefit and underlying risk in nation of heterogeneity in meta-analysis. Statistics in
meta-analysis. British Medical Journal 313, 1550–1551. Medicine 16, 2883–2900.
Shadish W.R. & Haddock C.K. (1994) Combining estimates Whitehead A. & Jones N.M.B. (1994) A meta-analysis of
of effect size. In The Handbook of Research Synthesis clinical trials involving different classifications of
(eds H. Cooper & L.V. Hedges), pp. 261–284. Russell response into ordered categories. Statistics in Medicine
Sage Foundation, New York. 13, 2503–2515.
148 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148