This document outlines a public health approach for realizing the promises of genomics and big data while addressing the challenges. It recommends: 1) Using a strong epidemiological foundation to study disease distribution and determinants in populations. 2) Developing a robust knowledge integration process to synthesize findings from different sources and disciplines. 3) Applying principles of evidence-based medicine and population screening to evaluate genomic applications. 4) Developing a robust translational research agenda beyond clinical applications to improve population health impact. The public health framework can help maximize the benefits of genomics and big data while minimizing risks.
1. Separating Signal from Noise in the Age of
Genomics & Big Data:
A Public Health Approach
Muin J. Khoury MD, PhD
CDC Office of Public Health Genomics
NCI Epidemiology & Genomics Research Program
2. Outline
Big Data & Causation in the Age of Genomics
Promises of Genomics & Big Data
Challenges of Genomics & Big Data
A Public Health Approach to Realize Potential of
Genomics & Big Data
3. A Case Study: Searching for Needles in the
Haystack- The CDC HuGE Navigator
http://www.hugenavigator.net/HuGENavigator/home.do
4. Text Mining Tool To Find HuGE Articles
in Published Literature
PubMed Signal/Noise ratio very
low
Support Vector Machine (SVM)
tool generated in 2008
Based on >3800 words in text,
extensively validated
Sensitivity & specificity >97%
Since 2008, genetic epidemiology
literature has changed
considerably
Performance of SVM model was
significantly reduced (60%)
In 2014, Retrained SVM now using
> 4500 words pushed sensitivity
and specificity to >90%
Yu W et al. BMC Bioinformatics, 2008
5. Application of Data Mining in the Prediction
of Type 2 Diabetes in the United States
1999-2004 National Health and
Nutrition Examination Survey
Developed and validated SVM
models for diabetes, undiagnosed
diabetes & prediabetes using
numerous variables in survey
Discriminative abilities Using area
under ROC curve of 84% and 73%
Validated known risk factors for
diabetes
Not clear what best models, what
best variables to use and how
applicable to other populations
Proof of concept only Yu W et al. BMC Medical Informatics 2010
6. The IOM Ecological Model & the Need for
Multilevel Analysis of “Causation”
Obesity Example NEJM 2007;357:404-7
IOM Ecological Model
7. Genomics & Big Data
The Genome is Just the Beginning
“We will all be surrounded by a personal cloud of billions of data pointsl“ L
Hood (ISB)
8. Big Data: From Association to Prediction
How about Causation?
Association
Replication
Classification
Prediction
?CAUSATION
Does Big Data care about “Causation”?
Intervention is based on cause-effect
relationships
10. The Promises of Genomics & Big Data
Workup of Rare & Familial Diseases
NEJM June2014
11. The Promises of Genomics & Big Data
Improved Disease Classification
12. The Promises of Genomics & Big Data
Improved Measurement of the “Environment”
http://www.niehs.nih.gov/research/programs/geh/geh_newsletter/2014/4/spotlight/index.cfm
13. The Promises of Genomics & Big Data
Better Understanding of Natural History
G Ginsburg
14. The Promises of Genomics & Big Data
Stratified Prevention (One size does not fit All)
No one is average: “population medicine: let’s get over it” (E. Topol)
17. The Promises of Genomics & Big Data
Public Health Practice
“As cholera swept through London in the
mid-19th century, a physician named John
Snow painstakingly drew a paper map
indicating clusters of homes where the
deadly waterborne infection had struck. In
an iconic feat in public health history, he
implicated the Broad Street pump as the
source of the scourge—a founding event in
modern epidemiology. Today, Snow might
have crunched GPS information and disease
prevalence data and solved the problem
within hours”
http://www.hsph.harvard.edu/news/magazine/big-datas-big-visionary/?
utm_source=SilverpopMailing&utm_medium=email&utm_cam
paign=Kiosk%2009.25.14_academic%20(1)&utm_content
18. Some Promises of Genomics & Big Data
Workup of Rare & Familial Diseases
Improved Disease Classification
Improved Measurement of the “Environment”
Better Understanding of Disease Natural History
Stratified Prevention
Precision Medicine
Pathogen Genomics
Public Health Practice
19. The Challenges of Genomics & Big Data
Problems of Study Designs & Hidden Biases
“…claims are based upon complex
(and we believe flawed)
analyses…there are far simpler
alternative explanations for the
patterns they observed. We believe
that the authors have not excluded
important alternative explanations“
G. Breen
Schizophrenia is Eight Different Diseases
Not One” USA Today (9/15/2014)
“Eight types of schizophrenia? Not so
fast” Genomes Unzipped (9/30/2014)
Am J Psychiatry Sep 2014
20.
21. The Challenges of Genomics & Big Data
Analytic Issues: Dealing with Complexity
Prediction of LDL cholesterol response to statin using transcriptomic and
genetic variation. Kyungpil Kim et al. Genome Biology, Sep 2014
22. The Challenges of Genomics & Big Data
Reproducibility
Lots of Input
Variables
Molecularly defined
Disease subsets & precursors
Millions
of genetic
variants
24. The Challenges of Genomics & Big Data
Causation, Ecologic Fallacies & Hubris
25. ‘The Scientific Method Itself is Growing
Obsolete.’ (A. Butte, Sep 2014)
“..implicit
assumption that big
data are a substitute
for, rather than a
supplement to,
traditional data
collection and
analysis."
http://blogs.kqed.org/science/
audio/how-big-data-is-changing-
medicine/
Garbage In, Garbage Out (GIGO)
26. The Challenges of Genomics & Big Data
Beyond Prediction: From Validity to Utility
27. The Challenges of Genomics & Big Data
Challenges of Population Stratification & Precision
Medicine
28. Some Challenges of Genomics & Big Data
Problems of Study Designs & Hidden Biases
Analytic Issues: Dealing with Complexity
Reproducibility and Replication
Causation vs Association-Ecologic Fallacies &
Hubris
Translation: from Validity into Utility and
Implementation
Challenges of Population Stratification &
Personalized Medicine
29. A Public Health Translation Framework
for Genomics & Big Data
Population
Health
Discovery
Evaluation
Evidence based
Recommendation
or Policy
T1
Health care
& Prevention
Programs
Application
Knowledge
Integration
T2
T4 T3
T0
Implementation
Science
Khoury MJ et al, AJPH, 2012
Effectiveness
& Outcomes
Research (CER, PCOR.
Economics, ELSI
Development
Basic, Clinical &
Population
Sciences
30. A Public Health Approach to Realizing
Promises of Genomics & Big Data
1. Use a Strong
Epidemiologic Foundation
The study of distribution and
determinants of disease occurrence
and outcomes in populations, and
using resulting knowledge to
improve health and prevent disease
Fundamental science of medicine
and public health
Human Genome Epidemiology
(HuGE)- Beyond Gene Discovery
New Brand of “Big Data
Epidemiology” 2010
31.
32. Epidemiologic Cohort Studies:
The NCI Cohort Consortium
• Investigators responsible:
– 40+ high-quality cohorts
– 4+ million people
• Coordinated,
interdisciplinary approach
• Tackle important scientific
questions, economies of
scale, and opportunities to
quicken the pace of
research
• Focused so far mostly on
etiology, but adapting to
include outcomes
• Major role in identifying
specific carcinogenic
environment agents
▫ Asbestos – Lung
▫ Benzene – Leukemia
▫ Smoking – many dzs
• Exposures/Risk factors
assessment prior to
onset of disease
▫ Overcome
recall/selection biases
• Permit absolute
measures of
risks/incidence rates
▫ Relevant for public
health policies
• Value resource for
studying for repeated
measures and multiple
outcomes
39. A Public Health Approach to Realizing
Promises of Genomics & Big Data
3. Use (and not avoid) Principles of Evidence-based
Medicine and Population Screening
41. Guidelines We Can Trust in Genomic Medicine
(Schully S et al. Genetics in Medicine 2014)
42. CDC-Sponsored
EGAPP Working Group
• Independent, multidisciplinary, non-federal panel
established in 2004
• Established a systematic, evidence-based process to
assess validity & utility of genomic tests & family health
history applications.
• New methods for evidence synthesis and modeling in 2013,
including next generation sequencing and stratified cancer
screening based on family history
• 10 recommendation statements to date:
• Colorectal cancer, breast cancer, heart disease, clotting
disorders, depression, prostate cancer, diabetes, and more
• Clinical Validity vs Clinical Utility
• Uncovered evidence gaps that require additional
research
• Principles can be applied to other “Big Data”
43. Evidence-based Classification of Genomic
Applications in Practice
Tier 1
Tier 2
Tier 3
http://www.cdc.gov/genomics/gtesting/tier.htm
45. A Public Health Approach to Realizing
Promises of Genomics & Big Data
4. Develop a Robust T2+ Translational
Research Agenda
46. Limited Translational Research in Genomics
Beyond the Bedside
T0 ↔ T1 ↔ T2 ↔ T3 ↔ T4
Discovery to Application Guideline to Practice to
Application to Guideline Practice Population
Khoury MJ, 2007, Schully, 2012. Clyne, M, 2014
Health
Impact
<1% of published genomics research
in T2 – T4
Multiple clinical and population
scientific disciplines involved
48. A MultiDisciplinary T2+ Research Agenda
Comparative Effectiveness Research
Patient-centered Outcomes Research
Behavioral, Social & Communication Sciences
Economic Studies
Surveillance & Population Monitoring
49. A Public Health Approach to Realizing
Promises of Genomics & Big Data
Use a Strong Epidemiologic Foundation
Develop a Robust Knowledge Integration
Process
Use (and not avoid) Principles of Evidence-based
Medicine and Population Screening
Develop a Robust T2+ Research Agenda
(Learning Health systems, Consumer
Involvement etc..)
50. In Summary
“Big Data” is agnostic to disease causation
Numerous promises for health impact of genomics
& Big Data- Leading edge in genomics in Big Data
beginning to be applied
But numerous challenges face genomics & Big
Data. So we should not overpromise & under
deliver
A “Public Health” translational approach Is needed
to realize potential of genomics & Big Data