Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
LLNL-PRES-670181
This work was performed under the auspices of the U.S. Department
of Energy by Lawrence Livermore Nationa...
Lawrence Livermore National Laboratory LLNL-PRES-670181
2
Crisis!
Essay
Open access, freely available online
factors that ...
Lawrence Livermore National Laboratory LLNL-PRES-670181
3
What’s going on?
Statistics is popular and important!
Statistici...
Lawrence Livermore National Laboratory LLNL-PRES-670181
4
STAT 101 is Procedural
1. Check your data type
2. Select inferen...
Lawrence Livermore National Laboratory LLNL-PRES-670181
5
Real Statistics Isn’t
Lawrence Livermore National Laboratory LLNL-PRES-670181
6
Comprehensive Plan for Reform of
All Statistics
1) Show the prob...
Lawrence Livermore National Laboratory LLNL-PRES-670181
7
§ Know thy problem.
§ Know thy tools.
§ Know thy data.
Golden...
Lawrence Livermore National Laboratory LLNL-PRES-670181
8
STAT 101: Determine the appropriate analysis by
looking at the d...
Lawrence Livermore National Laboratory LLNL-PRES-670181
9
STAT 101: Determine the appropriate analysis by
looking at the d...
Lawrence Livermore National Laboratory LLNL-PRES-670181
10
The Million Dollar Binomial Distribution
Lawrence Livermore National Laboratory LLNL-PRES-670181
11
Know Thy Tools
STAT 101: Statistical methods are selected accor...
Lawrence Livermore National Laboratory LLNL-PRES-670181
12
Know Thy Tools
STAT 101: Statistical methods are selected accor...
Lawrence Livermore National Laboratory LLNL-PRES-670181
13
yi = b0 + b1xi +εi
x
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●...
Lawrence Livermore National Laboratory LLNL-PRES-670181
14
Statistical Methods are Based
on Models
yi = b0 + b1xi +εi
x
●
...
Lawrence Livermore National Laboratory LLNL-PRES-670181
15
A Wise Man Once Said…
“Essentially, all models are wrong, but s...
Lawrence Livermore National Laboratory LLNL-PRES-670181
16
How to Evaluate Explosives Safety
A METHOD FOR OBTAINING AND AN...
Lawrence Livermore National Laboratory LLNL-PRES-670181
17
How NOT to Evaluate Explosives Safety
A METHOD FOR OBTAINING AN...
Lawrence Livermore National Laboratory LLNL-PRES-670181
18
A Note on Statistical Significance
(the following statements re...
Lawrence Livermore National Laboratory LLNL-PRES-670181
19
Know Thy Data
Parametric models are (of course) sensitive to
as...
Lawrence Livermore National Laboratory LLNL-PRES-670181
20
Know Thy Data
Parametric models are (of course) sensitive to
as...
Lawrence Livermore National Laboratory LLNL-PRES-670181
21
Jackie’s Improbable Sister
Jackie is a girl in a family with tw...
Lawrence Livermore National Laboratory LLNL-PRES-670181
22
Jackie’s Improbable Sister
A. 1/2
B. 1/3
How did we find Jackie...
Lawrence Livermore National Laboratory LLNL-PRES-670181
23
Option A: 1/2
1) Pick a two child family at random.
2) Pick a c...
Lawrence Livermore National Laboratory LLNL-PRES-670181
24
Option A: 1/2
1) Pick a two child family at random.
2) Pick a c...
Lawrence Livermore National Laboratory LLNL-PRES-670181
25
Option B: 1/3
1) Pick a two child family with at least one girl...
Lawrence Livermore National Laboratory LLNL-PRES-670181
26
Option B: 1/3
1) Pick a two child family with at least one girl...
Lawrence Livermore National Laboratory LLNL-PRES-670181
27
Real (and Expensive) Problem
1948
GENETICDIAGNOSIS Data barrier...
Lawrence Livermore National Laboratory LLNL-PRES-670181
28
To summarize…
Lawrence Livermore National Laboratory LLNL-PRES-670181
29
Don’t:
Lawrence Livermore National Laboratory LLNL-PRES-670181
30
§ Know thy problem.
§ Know thy tools.
§ Know thy data.
Do:
Lawrence Livermore National Laboratory LLNL-PRES-670181
31
§ Know thy problem.
§ Know thy tools.
§ Know thy data.
Do:
Lawrence Livermore National Laboratory LLNL-PRES-670181
32
The LLNL Statistical Consulting Service provides up to
4 hours ...
Lawrence Livermore National Laboratory LLNL-PRES-670181
34
Wikipedia: Betty Crocker Cookbook, Salk Polio Vaccine
Wikipedia...
Everything wrong with statistics (and how to fix it)
Nächste SlideShare
Wird geladen in …5
×

Everything wrong with statistics (and how to fix it)

2.019 Aufrufe

Veröffentlicht am

Scientific research in a number of fields is in a state of crisis due to the discovery that many published results are non-reproducible, and applied statistics has been assigned a substantial share of the blame. Proposed solutions range from requiring independent statistical review of results for major journals to abolishing the use of certain methods entirely.

Lennox argues that the problem does not lie with statistical methods, but rather from misleading training for non-statisticians. The talk is intended to establish that statistics is not just a set of numerical procedures, but rather a distinctive way of thinking about and solving problems. Real-world examples demonstrate the pitfalls of "procedural" statistics, and that non-statisticians can be successful by approaching statistical challenges in the same way that they do problems in their field of expertise and by leveraging the statistical expertise available at the laboratory as necessary.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Everything wrong with statistics (and how to fix it)

  1. 1. LLNL-PRES-670181 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Everything Wrong with Statistics (and How to Fix It) Kristin P. Lennox Director of Statistical ConsultingJuly 29, 2015
  2. 2. Lawrence Livermore National Laboratory LLNL-PRES-670181 2 Crisis! Essay Open access, freely available online factors that influence this problem and some corollaries thereof. Modeling the Framework for False Positive Findings Several methodologists have pointed out [9–11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p-values. Research findings are defined here as any relationship reaching is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fields where either there is only one true relationship (among many that can be hypothesized) or the power is similar to find any of the several existing true relationships. The pre-study probability of a relationship being true is R⁄(R + 1). The probability of a study finding a true relationship reflects the power 1 − β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists reflects the Type I error rate, α. Assuming that c relationships are being probed in the field, the expected values of the 2 × 2 table are given in Table 1. After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV. Why Most Published Research Findings Are False John P.A.Ioannidis Summary There is increasing concern that most current published research findings are false.The probability that a research claim is true may depend on study power and bias,the number of other studies on the same question,and,importantly,the ratio of true to no relationships among the relationships probed in each scientific field.In this framework,a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs,definitions, outcomes,and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings,it is more likely for a research claim to be false than true. Moreover,for many current scientific fields,claimed research findings may It can be proven that most claimed research findings are false.
  3. 3. Lawrence Livermore National Laboratory LLNL-PRES-670181 3 What’s going on? Statistics is popular and important! Statisticians are rare. Statistics training isn’t working. σ
  4. 4. Lawrence Livermore National Laboratory LLNL-PRES-670181 4 STAT 101 is Procedural 1. Check your data type 2. Select inference method 3. Calculate required sample statistics 4. Look up critical values … N. Report result
  5. 5. Lawrence Livermore National Laboratory LLNL-PRES-670181 5 Real Statistics Isn’t
  6. 6. Lawrence Livermore National Laboratory LLNL-PRES-670181 6 Comprehensive Plan for Reform of All Statistics 1) Show the problems with “cookbook statistics” 2) Demonstrate real statistical thinking 3) Help as needed
  7. 7. Lawrence Livermore National Laboratory LLNL-PRES-670181 7 § Know thy problem. § Know thy tools. § Know thy data. Golden Rules of Statistics (What Statisticians REALLY Do)
  8. 8. Lawrence Livermore National Laboratory LLNL-PRES-670181 8 STAT 101: Determine the appropriate analysis by looking at the data. E.g. two numeric variables = linear regression Know Thy Problem
  9. 9. Lawrence Livermore National Laboratory LLNL-PRES-670181 9 STAT 101: Determine the appropriate analysis by looking at the data. E.g. two numeric variables = linear regression Know Thy Problem Appropriate data AND appropriate analysis depend on the real world problem.
  10. 10. Lawrence Livermore National Laboratory LLNL-PRES-670181 10 The Million Dollar Binomial Distribution
  11. 11. Lawrence Livermore National Laboratory LLNL-PRES-670181 11 Know Thy Tools STAT 101: Statistical methods are selected according to the appropriateness to the data and correctness of assumptions. STAT 101: Statistical procedures, used correctly, yield unambiguous results.
  12. 12. Lawrence Livermore National Laboratory LLNL-PRES-670181 12 Know Thy Tools STAT 101: Statistical methods are selected according to the appropriateness to the data and correctness of assumptions. STAT 101: Statistical procedures, used correctly, yield unambiguous results. Statistical models work the same way that other scientific and engineering models work. Their validity depends on context, and they may be open to interpretation.
  13. 13. Lawrence Livermore National Laboratory LLNL-PRES-670181 13 yi = b0 + b1xi +εi x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● 0 10 20 30 40 50 050100150 x y −4 −2 0 2 40.000.100.200.30 x Density Statistical Methods are Based on Models
  14. 14. Lawrence Livermore National Laboratory LLNL-PRES-670181 14 Statistical Methods are Based on Models yi = b0 + b1xi +εi x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● 0 10 20 30 40 50 050100150 x y −5 0 5 10 150.000.040.080.12 x Density
  15. 15. Lawrence Livermore National Laboratory LLNL-PRES-670181 15 A Wise Man Once Said… “Essentially, all models are wrong, but some are useful. ” – George E. P. Box
  16. 16. Lawrence Livermore National Laboratory LLNL-PRES-670181 16 How to Evaluate Explosives Safety A METHOD FOR OBTAINING AND ANALYZING SENSITIVITY DATA* W. J. DIXON University of Oregon AND A. M. MOOD Iowa State College The standard method of dealing with sensitivity of dosage- mortality data is the probit technique developed by Bliss and Fisher. This paper provides an alternative technique based on a special system for obtaining such data. It has some ad- vantages when observations must be taken on individuals rather than groups of individuals, and it may be preferred in certain other situations. INTRODUCTION EX PERI MENTAL investigations often deal with continuous variables which cannot be measured in practice. For example, in testing the sensitivity of explosives to shock, a common procedure is to drop a weight on specimens of the same explosive mixture from various heights. There are heights at which some specimens will explode, and others will not, and it is assumed that those which willnot explode would explode were the weight dropped from a sufficiently greater height. It is supposed, therefore, that there is a critical height associated with each specimen, and that the specimen will explode when the weight is dropped from a greater height and will not explode when the weight is dropped from a lesser height. The population of specimens is thus characterized by a continuous variable-the critical height-which cannot be measured. All one can do is select some height arbitrarily and determine whether the critical height for a given specimen is less than or greater than the selected height. This situation arises in many fields of research. Thus in testing insec- ticides, a critical dose is associated with each insect, but one cannot oadedby[LawrenceLivermoreNationalLaboratory]at16:1903October2013 0 10 20 30 40 50 60 −202 Up−and−Down Test Demo Test NormalizedHeight x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x o o o o o o o o o o o o o o o o o o o o o o o o o o o o o x x
  17. 17. Lawrence Livermore National Laboratory LLNL-PRES-670181 17 How NOT to Evaluate Explosives Safety A METHOD FOR OBTAINING AND ANALYZING SENSITIVITY DATA* W. J. DIXON University of Oregon AND A. M. MOOD Iowa State College The standard method of dealing with sensitivity of dosage- mortality data is the probit technique developed by Bliss and Fisher. This paper provides an alternative technique based on a special system for obtaining such data. It has some ad- vantages when observations must be taken on individuals rather than groups of individuals, and it may be preferred in certain other situations. INTRODUCTION EX PERI MENTAL investigations often deal with continuous variables which cannot be measured in practice. For example, in testing the sensitivity of explosives to shock, a common procedure is to drop a weight on specimens of the same explosive mixture from various heights. There are heights at which some specimens will explode, and others will not, and it is assumed that those which willnot explode would explode were the weight dropped from a sufficiently greater height. It is supposed, therefore, that there is a critical height associated with each specimen, and that the specimen will explode when the weight is dropped from a greater height and will not explode when the weight is dropped from a lesser height. The population of specimens is thus characterized by a continuous variable-the critical height-which cannot be measured. All one can do is select some height arbitrarily and determine whether the critical height for a given specimen is less than or greater than the selected height. This situation arises in many fields of research. Thus in testing insec- ticides, a critical dose is associated with each insect, but one cannot oadedby[LawrenceLivermoreNationalLaboratory]at16:1903October2013 “…the up and down method is particularly effective for estimating the mean. It is not a good method for estimating small or large percentage points (for example, the height at which 99 per cent of specimens explode) unless normality of the distribution is assured.” – Dixon and Mood 0 10 20 30 40 50 60 −202 Up−and−Down Test Demo Test NormalizedHeight x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x o o o o o o o o o o o o o o o o o o o o o o o o o o o o o x x
  18. 18. Lawrence Livermore National Laboratory LLNL-PRES-670181 18 A Note on Statistical Significance (the following statements reflect only the author’s opinion, and should not be construed to reflect those of LLNL, the Applied Statistics Group, or any other person, statistician or not, living or dead) •  There isn’t anything wrong with p-values …but p=0.0501 is the same as p=0.0499 •  There isn’t anything wrong with statistical hypothesis testing … but it isn’t the right tool for making all decisions. These procedures aren’t broken. They are misused. This does not mean that you should keep using them.
  19. 19. Lawrence Livermore National Laboratory LLNL-PRES-670181 19 Know Thy Data Parametric models are (of course) sensitive to assumptions, but purely data driven approaches are far more robust to “cookbook” approaches.
  20. 20. Lawrence Livermore National Laboratory LLNL-PRES-670181 20 Know Thy Data Parametric models are (of course) sensitive to assumptions, but purely data driven approaches are far more robust to “cookbook” approaches. There are multiple cautions and caveats when using “big data” approaches. The most important is that you have to start with the right data.
  21. 21. Lawrence Livermore National Laboratory LLNL-PRES-670181 21 Jackie’s Improbable Sister Jackie is a girl in a family with two children. What is the probability that Jackie has a sister? A. 1/2 B. 1/3 C. 0 or 1, but we don’t know which
  22. 22. Lawrence Livermore National Laboratory LLNL-PRES-670181 22 Jackie’s Improbable Sister A. 1/2 B. 1/3 How did we find Jackie? Jackie is a girl in a family with two children. What is the probability that Jackie has a sister?
  23. 23. Lawrence Livermore National Laboratory LLNL-PRES-670181 23 Option A: 1/2 1) Pick a two child family at random. 2) Pick a child from the family at random.
  24. 24. Lawrence Livermore National Laboratory LLNL-PRES-670181 24 Option A: 1/2 1) Pick a two child family at random. 2) Pick a child from the family at random. Two girls have sisters and two girls have brothers.
  25. 25. Lawrence Livermore National Laboratory LLNL-PRES-670181 25 Option B: 1/3 1) Pick a two child family with at least one girl at random. 2) Report one girl’s name for each family.
  26. 26. Lawrence Livermore National Laboratory LLNL-PRES-670181 26 Option B: 1/3 1) Pick a two child family with at least one girl at random. 2) Report one girl’s name for each family. Of three possible families, only one has girls with sisters.
  27. 27. Lawrence Livermore National Laboratory LLNL-PRES-670181 27 Real (and Expensive) Problem 1948 GENETICDIAGNOSIS Data barriers hamper search for meaning in mutations p.156 FUNDING US science agencies gird themselves for the budget axe p.158 MALARIA Plant source of key drug faces lab-made competition p.160 BIOMEDICINE A showdown stem-cell th BY DECLAN BUTLER W hen influenza hit early and hard in the United States this year, it qui- etly claimed an unacknowledged victim: one of the cutting-edge techniques being used to monitor the outbreak. A com- parison with traditional surveillance data showed that Google Flu Trends, which esti- mates prevalence from flu-related Internet searches, had drastically overestimated peak flu levels. The glitch is no more than a tempo- complement, but not substitute for, traditional epidemiological surveillance networks. “It is hard to think today that one can pro- vide disease surveillance without existing systems,” says Alain-Jacques Valleron, an epidemiologist at the Pierre and Marie Curie University in Paris, and founder of France’s Sentinellesmonitoringnetwork.“Thenewsys- tems depend too much on old existing ones to be able to live without them,” he adds. This year’s US flu season started around November and seems to have peaked just after virulent of the three main seaso Traditional flu monitoring de on national networks of physicia cases of patients with influen (ILI) — a diffuse set of sympto high fever, that is used as a prox estimate is then refined by testi people with these symptoms to d many have flu and not some oth With its creation of the Sentin in 1984, France was the first co puterize its surveillance. Many c since developed similar netwo system, overseen by the Cente Control and Prevention (CDC Georgia, includes some 2,70 centres that record about 30 m visits annually. But the near-global coverage and burgeoning social-media p as Twitter have raised hopes th nologies could open the way to estimates of ILI, spanning large Themotherofthesenewsyste launchedin2008.Basedonresea and the CDC, it relies on data m of flu-related search terms enter search engine, combined wi modelling. Its estimates have a matched the CDC’s own surv over time — and it delivers them faster than the CDC can. The sy been rolled out to 29 countries w has been extended to include sur second disease, dengue. Google Flu Trends has cont formremarkablywell,andresear countries have confirmed that it are accurate. But the latest US flu to have confounded its algorithm for the Christmas national peak doubletheCDC’s(see‘Feverpea of its state data show even larger It is not the first time that a tripped Google up. In 2009, F to tweak its algorithms after its underestimated ILI in the Unite start of the H1N1 (swine flu) p glitch attributed to changes in p behaviour EPIDEMIOLOGY When Google got flu wrong US outbreak foxes a leading web-based method for tracking seasonal flu. The latest US influenza season is more severe and has caused more deaths than usual. JOHNANGELILLO/UPI/NEWSCOM NEWSINFOCU 2013 1954
  28. 28. Lawrence Livermore National Laboratory LLNL-PRES-670181 28 To summarize…
  29. 29. Lawrence Livermore National Laboratory LLNL-PRES-670181 29 Don’t:
  30. 30. Lawrence Livermore National Laboratory LLNL-PRES-670181 30 § Know thy problem. § Know thy tools. § Know thy data. Do:
  31. 31. Lawrence Livermore National Laboratory LLNL-PRES-670181 31 § Know thy problem. § Know thy tools. § Know thy data. Do:
  32. 32. Lawrence Livermore National Laboratory LLNL-PRES-670181 32 The LLNL Statistical Consulting Service provides up to 4 hours of assistance free of charge for LLNL projects. When in doubt: stats-consulting@llnl.gov https://data-analytics.llnl.gov/statistical_consultants Thank you! σ
  33. 33. Lawrence Livermore National Laboratory LLNL-PRES-670181 34 Wikipedia: Betty Crocker Cookbook, Salk Polio Vaccine Wikipedia (CC BY-SA 3.0): George Box Harry S. Truman Library: Bernard Dickmann with Harry S. Truman Library of Congress: Chicago Tribune Headline Plain Unicorn: WPClipart LLNL: NIF, Drop Hammer, Sigma the Statistics Unicorn Image sources:

×