3. Anecdotal evidence (Case reports) â Evidence based medicine (The Cochrane collaboration 1993) â Cohort study of smoking and lung cancer (1954) â (Bradford Hill) â Case-control study of smoking and lung cancer (1950) â (Bradford Hill) â Randomised clinical trial of streptomycin and tubercolosis (1948) â (Bradford Hill) â
4. EU directive (2001) â ICH GCP (1996) â CONSORT (1996) â WHO CIOMS (1993) â ICMJE Uniform Requirements (1978) â Helsinki declaration (1964) â NĂŒrnberg convention (1949) â Trial registration (2005)â Mandatory disclosure of trial results (2008)â
5. Plan 1. Methodological background 2. General guidelines 3. Special recommendations a) case reports b) mechanical experiments c) in vitro/cadaver experiments d) cross-sectional studies e) epidemiological studies f) randomized trials 4. Summary
7. What is statistics used for? 1. Describing data â (statistics in the plural)â 2. Interpreting uncertain data (statistics in the singular)â
8. Two kinds of uncertainty 1. Uncertainty of measurement 2. Uncertainty of sampling
9. 1. Uncertainty of measurement The precision of the used measurement instrument. The precision of the Finapres non-invasive blood pressure monitor is on the average 12.1 mm Hg.
10. 2. Uncertainty of sampling Individual effects vary between subjects. Different samples of subjects yield different observed mean effects.
11. Example Assume that the cumulative 10-year revision rate of the Oxford knee prosthesis is 8% and that two groups of 100 patients receiving the prosthesis are randomly selected and followed over time. The two groups are likely to get different numbers of patients revised during follow up.
15. 6% revised 12% revised H 0 : The two samples represent the same population H 1 : The two samples represent different populations
16. P-value The probability that an observed effect only reflects sampling uncertainty. 12/100 vs. 6/100, Fisher's exact test p = 0.22
17. P-values are often misunderstood They cannot - describe clinical relevance (they depend on sample size) â - show that a difference âdoes not existâ, because n.s. is absence of evidence, not evidence of absence
18. Confidence interval A range of values, which with the specified confidence level describes how likely it is that the estimated population parameter is included. 12/100 vs. 6/100, RR = 2.0 (95%Ci: 0.7 - 5.6) â 1 Relative Risk 2 1/2
19. Confidence interval A range of values, which with the specified confidence level describes how likely it is that the estimated population parameter is included. 12/100 vs. 6/100, RR = 2.0 (95%Ci: 0.7 - 5.6) â 1 Relative Risk 2 1/2 p < 0.05 n.s.
20. Important assumptions Many statistical methods like the Student's t-test and ANOVA are based on the assumption of Gaussian distribution and homogeneous variance.
21. Important assumptions Many statistical methods like the Student's t-test and ANOVA are based on the assumption of Gaussian distribution and homogeneous variance. If the assumptions are not met, use alternative (non-parametric) methods, like the Mann-Whitney U-test or Kruskal-Wallis non-parametric anova).
22. Important assumptions Most conventional methods (both parametric and non-parametric) require independent observations.
23. Important assumptions Most conventional methods (both parametric and non-parametric) require independent observations. - Patients are independent - Patients' knees, hips, shoulders, feet, etc. are not
25. How Many Patients? How Many Limbs? Analysis of Patients or Limbs in the Orthopaedic Literature: A Systematic Review Bryant et al. JBJS Am. 2006;88:41-45. Our findings suggest that a high proportion (42%) of clinical studies in high-impact-factor orthopaedic journals involve the inappropriate use of multiple observations from single individuals, potentially biasing results. Orthopaedic researchers should attend to this issue when reporting results.
26. Important assumptions Most conventional methods (both parametric and non-parametric) require independent observations. Include only one observation per patient, or use a statistical method that can handle dependant data, e.g. multilevel or mixed effects models. Always present both number of observations and patients.
27. Multiplicity In contrast to many other forms of precision, statistical precision depends on the number of performed measurements (significance tests).
28. Multiplicity Each significance test at a 5% significance level has 5% risk of a false positive test. Repeated testing increases the risk of at least one false positive test. Number of tests Risk of at least one false positive 1 0.05 2 0.10 5 0.23 10 0.40
35. Statistical Methods â Describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results.â
36. Statistical Methods â Describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results.â Required for analytical methods (statistical models, hypothesis tests, confidence intervals). Descriptions are often unclear, vague or ambiguous. They need to be clear and detailed.
37. Results â When possible, quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals).â
38. Results â When possible, quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals).â Statistical precision (p-values and confidence inter-vals) are necessary for generalization of results beyond examined patients.
39. Results â Avoid relying solely on statistical hypothesis testing, such as the use of P values, which fails to convey important information about effect size.â
40. Results â Avoid relying solely on statistical hypothesis testing, such as the use of P values, which fails to convey important information about effect size.â Describe both your observations and how you interpret them (use confidence intervals or p-values).
41. Clinically Statistically significant significant yes no yes a b no c d There was, or was no, (statistically significant) difference is too simplistic
42. Example Two side effects with a new osteoporosis treatment: - A statistically significant reduction in body hair growth rate by 5% (p = 0.04) â - A statistically insignificant increase in systolic blood pressure by 25 mmHg (p = 0.06) â
43. Confidence intervals are better than p-values In contrast to p-values they do - relate to clinical significance - show when a difference âdoes not existâ because they present lower and upper limits of potential clinical effects/differences
44. 0 Effect Clinically significant effects Statistically and clinically significant effect Statistically, but not necessarily clinically, significant effect Inconclusive Neither statistically nor clinically significant effect Statistically significant reversed effect p < 0.05 p < 0.05 n.s. n.s. p < 0.05 P-values Conclusion from confidence intervals [2 alternatives] [6 alternatives] P-value and confidence interval Statistically but not clinically significant effect p < 0.05
45. When there is a difference in data Do not write that there is not a difference!
47. There were indeed differences, they are 0.45 and 0.57 Better alternative: â The observed differences in extraction torques between the two types of uncoated distal pins can be explained by chance.â
48. Avoid non-technical use of technical terms and use clear expressions - significant clinically or statistically? - no difference statistically insignificant? - statistical difference statistically significant? - matched selected or just comparable? - correlation relation, regression? - normal Gaussian distribution? - random mathematical algorithm? - etc.
55. Mechanical experiments What do p-values and confidence intervals relate to? - Measurement uncertainty (Perhaps) â - Sampling uncertainty (No, there is no information on subject variation. The findings cannot be generalized beyond the device).
57. In vitro/cadaver experiments What do p-values and confidence intervals relate to? - Measurement uncertainty (Perhaps) â - Sampling uncertainty (Perhaps, if the observations provide information on variation between subjects) â
58. Example In a study with 60 observations 20 specimens had been taken from each of 3 subjects. The specimens were distributed randomly between one control group and one experimental group. What do significance tests of these two groups tell us?
62. Epidemiological studies - Exploratory, hypothesis generating, multiplicity issues considered less important than validity issues - External validity (source of subjects) â - Internal validity (confounding) â
63. Results Uniform Requirements: âWhere scientifically appropriate, analyses of the data by variables such as age and sex should be included.â
64. Results Uniform Requirements: âWhere scientifically appropriate, analyses of the data by variables such as age and sex should be included.â Observational studies require adjustment for known and suspected confounding factors to produce valid effect estimates. This adjustment is usually performed using statistical modelling (e.g. ANCOVA or regression analysis). The purpose is to increase validity.
65. Results Automatic stepwise regression (forward or backward) is not an adequate method for confounding adjustment.
67. Clinical trials â The ICMJE member journals will require, as a condition of consideration for publication in their journals, registration in a public trials registry.â â The ICMJE recommends that journals publish the trial registration number at the end of the Abstract.â
68. Clinical trials â When reporting experiments on human subjects, authors should indicate whether the procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2000 (5).â
69. WORLD MEDICAL ASSOCIATION DECLARATION OF HELSINKI Ethical Principles for Medical Research Involving Human Subjects 27. ...Reports of experimentation not in accordance with the principles laid down in this Declaration should not be accepted for publication.
70. Purpose of a randomized trial To test a hypothesis with control of random and systematic errors. - No bias (randomization & blinding) â - No multiplicity problems
72. Study populations Intention-to-treat Analyze all randomized subjects (ITT) principle according to planned treatment regimen. Full analysis set The set of subjects that is as close (FAS) as possible to the ideal implied by the ITT-principle. Per protocol The set of subjects who complied (PP) set with the protocol sufficiently to ensure that they are likely to exhibit the effects of treatment according to the underlying scientific model.
73. FAS vs. PP-set FAS + no selection bias - misclassification problem (effect dilution) â PP-set + no contamination problem - possible selection bias (confounding) â When the FAS and PP-set lead to essentially the same conclusions, confidence in the trial is supported.
74. Endpoints Primary The variable capable of providing the most clinically relevant evidence directly related to the primary objective of the trial Secondary Either measurements supporting the primary endpoint or effects related to secondary objectives
75. Statistical analyses Confirmatory The result concerns a primary endpoint and the p-value or confidence interval accounts for potential multiplicity. The result can support a claim of superiority, equivalence or non- inferiority. Exploratory All other analyses. The result is either supporting or explanatory, or simply just a new hypothesis.
76. Reporting â For reports of randomized controlled trials authors should refer to the CONSORT statement.â
80. Include with the manuscript Study Protocol Statistical Analysis Plan
81. Clinical trials International regulatory guidelines ICH Topic E9 - Statistical Principles for Clinical Trials EMEA Points to consider: baseline covariates - missing data - multiplicity issues - etc. and similar documents from the FDA These guidelines can all be found on the internet.
83. The responsibilities of a statistical reviewer â To make sure that the authors spell out for the reader the limitations imposed upon the conclusions by the design of the study, the collection of data, and the analyses performed.â Shor S. The responsibilities of a statistical reviewer. Chest 1972;61:486-487.
84. Read the manuscript from end to beginning, and look for weaknesses in the links between: 1. Conclusion 2. Discussion (Discussion section) â 3. Results (Results section) â 4. Methods (Material & methods section) â 5. Data (Material & methods section) â 5. Hypothesis (Introduction) â Make sure the chain holds all the way!
85. Summary 1. Present statistical methods in detail, and the number of observations included in each analysis. 2. Present data, statistical results and your conclusions - data description vs. results interpretation - clinical vs. statistical significance - absence of evidence is not evidence of absence 3. Adjust for confounding factors in observational studies (but do not use stepwise regression) â 4. Comply with the CONSORT checklist in randomized studies