Watch the video at: https://www.statsols.com/webinars/practical-methods-to-overcome-sample-size-challenges
In this webinar hosted by Ronan Fitzpatrick - Head of Statistics and nQuery Lead Researcher at Statsols - we will examine some of the most common practical challenges you will experience while calculating sample size for your study. These challenges will be split into two categories:
1. Overcoming Sample Size Calculation Challenges
(Survival Analysis Example)
We will examine practical methods to overcome common sample size calculation issues by focusing in on one of the more complex areas for sample size determination; Survival analysis. We will cover difficulties and potential issues surrounding challenges such as:
Drop Out: How to deal with expected dropouts or censoring. We compare the simple loss-to-follow-up method and integrating a dropout process into the sample size model?
Planning Uncertainty: How best to deal with the inevitable uncertainty at the planning stage? We examine how best to apply a sensitivity analysis and Bayesian approaches to explore the uncertainty in your sample size calculations.
Choosing the Effect Size: Various approaches and interpretations exist for how to find the effect size value. We examine those contrasting interpretations and determine the best method and also how to deal with parameterization options.
2. Overcoming Study Design Challenges
(Vaccine Efficacy Example)
The Randomised Controlled Trial (RCT) is considered the gold standard in trial design in drug development. However, there are often practical impediments which mean that adjustments or pragmatic approaches are needed for some trials and studies.
We will examine practical methods how to overcome common study design challenges and how these affect your sample size calculations. In this webinar, we will use common issues in vaccine study design to examine difficulties surrounding issues such as:
Case-Control Analysis: We will examine how to deal with study constraints and how to deal with analyses done during an observational study.
Alternative Randomization Methods: How best to address randomization in your vaccine trial design when full randomization is difficult, expensive or impractical. We examine how sample size calculations are affected with cluster or Mendelian randomization.
Rare Events: How does an outcome being rare affect the types of study design and statistical methods chosen in your study.
7. Sample Size Determination
(SSD) Review
SSD finds the appropriate sample size for
your study
ď Common metrics are statistical power, interval
width or cost
SSD seeks to balance ethical and practical
issues
ď A standard design requirement for regulatory
purposes
SSD is crucial to arrive at valid conclusions in
a study
9. 5 Essential Steps for Sample Size
1 Plan Study
Study question, primary outcome,
method
2 Specify Parameters
Significance Level, Standard deviation,
dispersion
3 Choose Effect Size
Expected/targeted difference, ratio or
effect size
4
Compute Sample Sample Size for specified metric such as
12. Dealing with Planning Uncertainty
â˘SSD occurs before trial so
uncertainty is intrinsic
challenge
â˘Will specify multiple
parameters
â˘Some Fixed (Îą), some unknown (Ď)
â˘More Parameters, More
Assumptions
â˘Justify, if possible, your
estimates
â˘Pilot study, previous data, other
studies, elicitation, guesswork
etc.
13. Exploring SSD Planning Uncertainty
â˘Biggest Pitfall: Not
bothering
â˘Should explore all
assumptions
â˘Design, ES, other values, N,
power
â˘Should do Sensitivity
Analysis
â˘What scenarios to test? (95%
CI)
â˘How confident in first N
estimate?
â˘Focus where uncertain/large
effect
âThe method by which the
sample size is calculated
should be given in the
protocol, together with the
estimates of any quantities
used in the calculations (such
as variances, mean values,
response rates, event rates,
difference to be detected)⌠It
is important to investigate the
sensitivity of the sample size
estimate to a variety of
deviations from these
assumptionsâŚâ
-ICH E9: Statistical Principles
14. How to Select the Correct Effect Size?
â˘Differing opinions on
meaning
â˘Expected, clinical sig.,
standardised
â˘âMinimum value worth
detectingâ
â˘# of approaches to find
value
â˘Mix and match
sources/methods
â˘Approach and meaning linked
â˘Do not reverse justify from N!
Source: S.
Julious (2017)
15. Sample Size for Survival
AnalysisSurvival Analysis is about
the expected duration of
time to an event
ď Methods like log-rank test &
Cox Model
Power is related to the
number of events NOT the
sample size
ď Sample Size = Subjects to get
no. of events
Flexibility expected in
survival analysis methods
and estimation
ď Sample Size methods need to
follow suit but can make
mistakes easier!
Source: SEER, NCI
16. Considerations in Survival Sample
Size
What is the expected survival curve(s) in the
group(s)?
ďAssume parametric approximation? Which test
appropriate?
Effect of unequal follow-up due to accrual
period?
ďWhat accrual pattern to assume? Set max follow-up
same for all?
How to deal with expected dropouts or
censoring?
ďSimple loss-to-follow-up or integrate dropout
17. Survival Analysis Example
âUsing an unstratified log-rank test at the
one-sided 2.5% significance level, a total
of 282 events would allow 92.6% power
to demonstrate a 33% risk reduction
(hazard ratio for RAD/placebo of about
0.67, as calculated from an anticipated
50% increase in median PFS, from 6
months in placebo arm to 9 months in
the RAD001 arm). With a uniform accrual
of approximately 23 patients per month
over 74 weeks and a minimum follow up
of 39 weeks, a total of 352 patients
would be required to obtain 282 PFS
events, assuming an exponential
progression-free survival distribution
with a median of 6 months in the Placebo
arm and of 9 months in RAD001 arm.
With an estimated 10% lost to follow up
patients, a total sample size of 392
Source: nejm.org
Parameter Value
Significance Level (One-Sided) 0.025
Placebo Median Survival
(months)
6
Everolimus Median Survival
(months)
9
Hazard Ratio 0.6666
7
Accrual Period (Weeks) 74
Minimum Follow-Up (Weeks) 39
18. Assurance for Clinical Trials
ď§Assurance (a.k.a âBayesian
Powerâ) is the unconditional
probability of significance
given a prior
ď§Focus on methods proposed
by OâHagan et al. (2005)
ď§Assurance is the expectation
of the power averaged over
a prior distribution for the
effect
ď§Often framed the âtrue
probability of successâ of a
trial
ď§Can be considered as a
Bayesian analogue to
Source: OâHagan
(2005)
20. Study Design and Sample Size
â˘Fail to prepare, prepare to
fail â Design, Design, Design!
â˘SSD is one part of design
process
â˘Hypotheses, methods etc (next
slide)
â˘To start can rely on âstandardâ
method
â˘Consider choices effect on
power
â˘See âdichotomaniaâ, different
methods strengths/weaknesses
â˘Power can also highlight poor
Source: S. Senn (2005)
21. Common Design Considerations
& SSD
1. Basic Design Considerations and Options
ď Data type, # groups, matching, covariates, rare
event?, RCT?
2. Study Objective Considerations and Options
ď Hypothesis Type(s), primary endpoint(s), co-
primary/secondary?
3. Complex Design Considerations and Options
ď Observational/retrospective, randomization,
adaptive methods
4. Statistical Considerations and Options
ď Statistical model, assumptions, adjustments,
22. Vaccine Study Sample Size
Background
ď§Vaccine trials typically rely
on randomized trials but
deal with specific issues
ď§Rare events, large sample
size, event lag, clustering
in population studies
ď§Due to above wider variety
of methods and evidence
sources used pre & post
approval
ď§Will explore briefly here
Source: R. Ahlawat (2015)
23. Vaccine Efficacy Subject-Level
Example
âAn investigator desires a 95 per
cent confidence interval for VE
where he anticipates that the
prevalence rate of vaccine
exposure in the control group is
20 per cent, and the vaccine
efficacy VE is 80 per cent and the
desired relative width of the
interval is 0.30. That is, the
absolute width of the confidence
interval will be (0.8x0.3) = 0.24
⌠for C = 1 where the numbers
of cases and controls are equal
yields a case sample size of 336.
For C = 4, a 4 to 1 control to
case size ratio, the case sample
Source: Wiley.com
Parameter Value
Confidence Level (Two-
Sided)
0.95
Vaccine Efficacy 0.8
Control Disease
Prevalence
0.2
Relative Width (Absolute) 0.3 (0.24)
Case to Controls Ratio 1,4
24. Vaccine Efficacy Area-Level CRT
Example
âA total of approximately 24,834
participants will be needed in
order to have 80% power to detect
a 60% vaccine protection at a 5%
level of significance (two-tailed).
This sample size calculation
assumed a minimum cumulative
typhoid incidence of 2.8 per 1,000
(during 2 years) and between
cluster coefficient of variation (CV)
of 0.5. Assuming an annual
attrition rate of 10% a total sample
of 27,592 participants is needed.
Each cluster will be composed of
groups of households. Cluster size
will be 200-600 individuals (120
Source:
ScienceDirect.com
Parameter Value
Significance Level (2-
sided)
0.05
Expected Control
Incidence Rate
0.0028
Treatment Incidence Rate 0.00112
Coefficient of Variation 0.5
Sample Size per Cluster 207
Power 80%
25. Mendelian Randomization Studies
ď§Mendelian Randomization
(MR) uses underlying
genetic variation to make
causal inferences
ď§Uses genes with well
understood link between
polymorphism(s) and
relevant intermediate
phenotype
ď§Note that gene must be
indirectly related to
exposure of interest
ď§MR uses gene(s) as an
Source: S. Burgess et. al.
(2012)
26. Mendelian Randomization Example
âWe computed F statistics
and R 2 values (the proportion of
variation in height and BMI
explained by the genetic risk
score) from the linear regression
to evaluate the strength of the
genetic risk score instruments in
a population of men at increased
risk of cancer. We had 82 and
78 % power to detect an odds
ratio of 1.12 and 1.25 for the
effects of height and BMI on
prostate cancer risk, assuming a
sample size of 41,062 and that
the genetic risk scores explained
6.31 and 1.46 % of the variation
Source: Springer.com
Parameter Value
Significance Level (Two-
Sided)
0.05
Positive Outcome
Proportion
0.5
Odds Ratio 1.12/1.25
Variance Explained 0.0631/0.01
46
28. Discussion and Conclusions
SSD one of many integral parts of planning
process
ď SSD emerges from study plan & statistical
method/model
More planning = easier sample size
determination
ď Design carefully given study type and
constraints
In SSD think hard on both explicit & implicit
assumptions
29. nQuery Timeline
nTerim introduced
G.S.T
C.R.T
Count Data
MANOVA / ANOVA
Launch of nQuery
Advanced
New platform = Modern
all-in-one software
solution
New Bayesian Module
Survival Focus
IQ/OQ Tools
52 new Core
Tables
20 new Bayes
Tables
-Launch of nQuery
Advisor 1.0
Developed by Dr.
Janet D. Elashoff
-Contiguous
Innovation and
releases
2007 - 2016 2017 Spring 20181996-2007
30. nQuery Spring 2018 Update
Initial release focused on Survival & Bayesian tables.
April release adds 72 new tables in following areas:
New Bayes tables in
April update
New tables in April
update
Epidemiology Non-inferiority/
Equivalence
Correlation/ROC
Bayesian
Sample Size
32. References
Senn, S. S. (2008). Statistical issues in drug development (2nd Edition), John Wiley & Sons.
Senn, S. J. (2005). Dichotomania: an obsessive compulsive disorder that is badly affecting the quality of
analysis of pharmaceutical trials. Proceedings of the International Statistical Institute, 55th Session,
Sydney.
Julious, S. A. (2009). Sample sizes for clinical trials. CRC Press.
Yao, J. C., et. al. (2011). Everolimus for advanced pancreatic neuroendocrine tumors. New England Journal
of Medicine, 364(6), 514-523.
O'Hagan, A., Stevens, J. W., & Campbell, M. J. (2005). Assurance in clinical trial design. Pharmaceutical
Statistics, 4(3), 187-201.
O'neill, R. T. (1988). On sample sizes to estimate the protective efficacy of a vaccine. Statistics in Medicine,
7(12), 1279-1288.
Khan, M. I., Soofi, S. B., Ochiai, R. L., Habib, M. A., Sahito, S. M., Nizami, S. Q., ... & DOMI Typhoid Karachi Vi
Effectiveness Study Group. (2012). Effectiveness of Vi capsular polysaccharide typhoid vaccine among
children: a cluster randomized trial in Karachi, Pakistan. Vaccine, 30(36), 5389-5395.
Burgess, S. (2014). Sample size and power calculations in Mendelian randomization with a single
instrumental variable and a binary outcome. International Journal of Epidemiology, 43, 922-929
Davies, N. M., et. al. (2015). The effects of height and BMI on prostate cancer incidence and mortality: a
Hinweis der Redaktion
Point 1:
http://rsos.royalsocietypublishing.org/content/1/3/140216 -> Screening problem analogy.
Type S Error = Sign Error i.e. sign of estimate is different than actual population value
Type M Error = Magnitude Error i.e. estimate is order of magnitude different than actual value
Point 2:
Know we have only 100 subjects available. Need to know what power will this give us, i.e. is there enough power to justify even doing the study.
Stage III clinical trials constitute 90% of trial costs, vital to reduce waste and ensure can fulfil goal.
Point 3:
Sample Size requirements described in ICH Efficacy Guidelines 9: STATISTICAL PRINCIPLES FOR CLINICAL TRIALS
See FDA/NIH draft protocol template here: http://osp.od.nih.gov/sites/default/files/Protocol_Template_05Feb2016_508.pdf (Section 10.5)
Nature Statistical Checklist: http://www.nature.com/nature/authors/gta/Statistical_checklist.doc
Point 4:
In Cohenâs (1962) seminal power analysis of the journal of Abnormal and Social Psychology he concluded that over half of the published studies were insufficiently powered to result in statistical significance for the main hypothesis. Many journals (e.g. Nature) now require that authors submit power estimates for their studies.
Power/Sample size one of areas highlighted when discussing âcrisis of reproducibilityâ (Ioannidis). Relatively easy fix compared to finding p-hacking etc.
More detail available on our website via a whitepaper.
Alternative linear rank tests include Tahone-Ware, Gehan. Planned for next release circa. Summer 2016.
Sample size mainly asking âHow many subjects needed to attain X events?â
Most methods optimised for exponential survival but could enter piece-wise linear approximation of probability at time t for other distributions (e.g. Weibull)
Analytic vs Simulation = Much wider debate. Usually have ease of use vs flexibility trade-off. Simulation better suited to programming environment e.g. R
Appropriate would usually be defined in terms of preventing too low sample size, though too high has practical costs.
Other metrics include confidence interval width (precision), cost based and Bayesian methods.
Important to note that you need to specify an exact value for the effect even though alternative hypothesis acceptance space can technically can be any non-null hypothesis point value (commonly any non-zero value)
Can be though of as the area of the alternative pdf which is contained within the rejection region of the null hypothesis.
Interesting to note that power of 50% is equivalent to lower limit of CI being equal to zero for zero-based z-statistic null
Point 1:
http://rsos.royalsocietypublishing.org/content/1/3/140216 -> Screening problem analogy.
Type S Error = Sign Error i.e. sign of estimate is different than actual population value
Type M Error = Magnitude Error i.e. estimate is order of magnitude different than actual value
Point 2:
Know we have only 100 subjects available. Need to know what power will this give us, i.e. is there enough power to justify even doing the study.
Stage III clinical trials constitute 90% of trial costs, vital to reduce waste and ensure can fulfil goal.
Point 3:
Sample Size requirements described in ICH Efficacy Guidelines 9: STATISTICAL PRINCIPLES FOR CLINICAL TRIALS
See FDA/NIH draft protocol template here: http://osp.od.nih.gov/sites/default/files/Protocol_Template_05Feb2016_508.pdf (Section 10.5)
Nature Statistical Checklist: http://www.nature.com/nature/authors/gta/Statistical_checklist.doc
Point 4:
In Cohenâs (1962) seminal power analysis of the journal of Abnormal and Social Psychology he concluded that over half of the published studies were insufficiently powered to result in statistical significance for the main hypothesis. Many journals (e.g. Nature) now require that authors submit power estimates for their studies.
Power/Sample size one of areas highlighted when discussing âcrisis of reproducibilityâ (Ioannidis). Relatively easy fix compared to finding p-hacking etc.