SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Separating Signal from Noise in the Age of 
Genomics & Big Data: 
A Public Health Approach 
Muin J. Khoury MD, PhD 
CDC Office of Public Health Genomics 
NCI Epidemiology & Genomics Research Program
Outline 
 Big Data & Causation in the Age of Genomics 
 Promises of Genomics & Big Data 
 Challenges of Genomics & Big Data 
 A Public Health Approach to Realize Potential of 
Genomics & Big Data
A Case Study: Searching for Needles in the 
Haystack- The CDC HuGE Navigator 
http://www.hugenavigator.net/HuGENavigator/home.do
Text Mining Tool To Find HuGE Articles 
in Published Literature 
 PubMed Signal/Noise ratio very 
low 
 Support Vector Machine (SVM) 
tool generated in 2008 
 Based on >3800 words in text, 
extensively validated 
 Sensitivity & specificity >97% 
 Since 2008, genetic epidemiology 
literature has changed 
considerably 
 Performance of SVM model was 
significantly reduced (60%) 
 In 2014, Retrained SVM now using 
> 4500 words pushed sensitivity 
and specificity to >90% 
Yu W et al. BMC Bioinformatics, 2008
Application of Data Mining in the Prediction 
of Type 2 Diabetes in the United States 
 1999-2004 National Health and 
Nutrition Examination Survey 
 Developed and validated SVM 
models for diabetes, undiagnosed 
diabetes & prediabetes using 
numerous variables in survey 
 Discriminative abilities Using area 
under ROC curve of 84% and 73% 
 Validated known risk factors for 
diabetes 
 Not clear what best models, what 
best variables to use and how 
applicable to other populations 
 Proof of concept only Yu W et al. BMC Medical Informatics 2010
The IOM Ecological Model & the Need for 
Multilevel Analysis of “Causation” 
Obesity Example NEJM 2007;357:404-7 
IOM Ecological Model
Genomics & Big Data 
The Genome is Just the Beginning 
“We will all be surrounded by a personal cloud of billions of data pointsl“ L 
Hood (ISB)
Big Data: From Association to Prediction 
How about Causation? 
 Association 
 Replication 
 Classification 
 Prediction 
 ?CAUSATION 
 Does Big Data care about “Causation”? 
 Intervention is based on cause-effect 
relationships
The Promises of Genomics & Big Data 
The Economist
The Promises of Genomics & Big Data 
 Workup of Rare & Familial Diseases 
NEJM June2014
The Promises of Genomics & Big Data 
 Improved Disease Classification
The Promises of Genomics & Big Data 
 Improved Measurement of the “Environment” 
http://www.niehs.nih.gov/research/programs/geh/geh_newsletter/2014/4/spotlight/index.cfm
The Promises of Genomics & Big Data 
 Better Understanding of Natural History 
G Ginsburg
The Promises of Genomics & Big Data 
 Stratified Prevention (One size does not fit All) 
No one is average: “population medicine: let’s get over it” (E. Topol)
The Promises of Genomics & Big Data 
 Precision Medicine
The Promises of Genomics & Big Data 
 Pathogen Genomics
The Promises of Genomics & Big Data 
 Public Health Practice 
“As cholera swept through London in the 
mid-19th century, a physician named John 
Snow painstakingly drew a paper map 
indicating clusters of homes where the 
deadly waterborne infection had struck. In 
an iconic feat in public health history, he 
implicated the Broad Street pump as the 
source of the scourge—a founding event in 
modern epidemiology. Today, Snow might 
have crunched GPS information and disease 
prevalence data and solved the problem 
within hours” 
http://www.hsph.harvard.edu/news/magazine/big-datas-big-visionary/? 
utm_source=SilverpopMailing&utm_medium=email&utm_cam 
paign=Kiosk%2009.25.14_academic%20(1)&utm_content
Some Promises of Genomics & Big Data 
 Workup of Rare & Familial Diseases 
 Improved Disease Classification 
 Improved Measurement of the “Environment” 
 Better Understanding of Disease Natural History 
 Stratified Prevention 
 Precision Medicine 
 Pathogen Genomics 
 Public Health Practice
The Challenges of Genomics & Big Data 
 Problems of Study Designs & Hidden Biases 
“…claims are based upon complex 
(and we believe flawed) 
analyses…there are far simpler 
alternative explanations for the 
patterns they observed. We believe 
that the authors have not excluded 
important alternative explanations“ 
G. Breen 
Schizophrenia is Eight Different Diseases 
Not One” USA Today (9/15/2014) 
“Eight types of schizophrenia? Not so 
fast” Genomes Unzipped (9/30/2014) 
Am J Psychiatry Sep 2014
The Challenges of Genomics & Big Data 
 Analytic Issues: Dealing with Complexity 
Prediction of LDL cholesterol response to statin using transcriptomic and 
genetic variation. Kyungpil Kim et al. Genome Biology, Sep 2014
The Challenges of Genomics & Big Data 
 Reproducibility 
Lots of Input 
Variables 
Molecularly defined 
Disease subsets & precursors 
Millions 
of genetic 
variants
Am J Clin Nutrition 2013
The Challenges of Genomics & Big Data 
 Causation, Ecologic Fallacies & Hubris
‘The Scientific Method Itself is Growing 
Obsolete.’ (A. Butte, Sep 2014) 
“..implicit 
assumption that big 
data are a substitute 
for, rather than a 
supplement to, 
traditional data 
collection and 
analysis." 
http://blogs.kqed.org/science/ 
audio/how-big-data-is-changing- 
medicine/ 
Garbage In, Garbage Out (GIGO)
The Challenges of Genomics & Big Data 
 Beyond Prediction: From Validity to Utility
The Challenges of Genomics & Big Data 
 Challenges of Population Stratification & Precision 
Medicine
Some Challenges of Genomics & Big Data 
 Problems of Study Designs & Hidden Biases 
 Analytic Issues: Dealing with Complexity 
 Reproducibility and Replication 
 Causation vs Association-Ecologic Fallacies & 
Hubris 
 Translation: from Validity into Utility and 
Implementation 
 Challenges of Population Stratification & 
Personalized Medicine
A Public Health Translation Framework 
for Genomics & Big Data 
Population 
Health 
Discovery 
Evaluation 
Evidence based 
Recommendation 
or Policy 
T1 
Health care 
& Prevention 
Programs 
Application 
Knowledge 
Integration 
T2 
T4 T3 
T0 
Implementation 
Science 
Khoury MJ et al, AJPH, 2012 
Effectiveness 
& Outcomes 
Research (CER, PCOR. 
Economics, ELSI 
Development 
Basic, Clinical & 
Population 
Sciences
A Public Health Approach to Realizing 
Promises of Genomics & Big Data 
 1. Use a Strong 
Epidemiologic Foundation 
 The study of distribution and 
determinants of disease occurrence 
and outcomes in populations, and 
using resulting knowledge to 
improve health and prevent disease 
 Fundamental science of medicine 
and public health 
 Human Genome Epidemiology 
(HuGE)- Beyond Gene Discovery 
 New Brand of “Big Data 
Epidemiology” 2010
Epidemiologic Cohort Studies: 
The NCI Cohort Consortium 
• Investigators responsible: 
– 40+ high-quality cohorts 
– 4+ million people 
• Coordinated, 
interdisciplinary approach 
• Tackle important scientific 
questions, economies of 
scale, and opportunities to 
quicken the pace of 
research 
• Focused so far mostly on 
etiology, but adapting to 
include outcomes 
• Major role in identifying 
specific carcinogenic 
environment agents 
▫ Asbestos – Lung 
▫ Benzene – Leukemia 
▫ Smoking – many dzs 
• Exposures/Risk factors 
assessment prior to 
onset of disease 
▫ Overcome 
recall/selection biases 
• Permit absolute 
measures of 
risks/incidence rates 
▫ Relevant for public 
health policies 
• Value resource for 
studying for repeated 
measures and multiple 
outcomes
Epidemiology Data Sharing & Harmonization 
Nature, August 27, 2014
A Public Health Approach to Realizing 
Promises of Genomics & Big Data 
 2. Develop a Robust Knowledge Integration 
Process
A Public Health Approach to Realizing 
Promises of Genomics & Big Data 
 2. Develop a Robust Knowledge Integration 
Process
Components of Knowledge Integration 
• Knowledge Management: Integration of 
knowledge from disparate sources & disciplines 
• Knowledge Synthesis: Systematic synthesis 
of scientific findings 
▫ Accumulating evidence on a cancer outcome 
Minimize waste in repeat funding 
▫ Identify scientific gaps 
Inform research priorities 
• Knowledge Translation 
▫ Stakeholder engagement 
▫ Evidence-based information 
▫ Decision support tools
Interpretation 
“The Bottleneck for Realizing Personalized Medicine” 
(Good et al. Genome Biology Sep 2014)
The NIH BD2K Initiative Can Help
A Public Health Approach to Realizing 
Promises of Genomics & Big Data 
 3. Use (and not avoid) Principles of Evidence-based 
Medicine and Population Screening
Guidelines We Can Trust (IOM, 2011)
Guidelines We Can Trust in Genomic Medicine 
(Schully S et al. Genetics in Medicine 2014)
CDC-Sponsored 
EGAPP Working Group 
• Independent, multidisciplinary, non-federal panel 
established in 2004 
• Established a systematic, evidence-based process to 
assess validity & utility of genomic tests & family health 
history applications. 
• New methods for evidence synthesis and modeling in 2013, 
including next generation sequencing and stratified cancer 
screening based on family history 
• 10 recommendation statements to date: 
• Colorectal cancer, breast cancer, heart disease, clotting 
disorders, depression, prostate cancer, diabetes, and more 
• Clinical Validity vs Clinical Utility 
• Uncovered evidence gaps that require additional 
research 
• Principles can be applied to other “Big Data”
Evidence-based Classification of Genomic 
Applications in Practice 
Tier 1 
Tier 2 
Tier 3 
http://www.cdc.gov/genomics/gtesting/tier.htm
Evidence-based Binning of the Genome 
Genetics in Medicine 2011
A Public Health Approach to Realizing 
Promises of Genomics & Big Data 
 4. Develop a Robust T2+ Translational 
Research Agenda
Limited Translational Research in Genomics 
Beyond the Bedside 
T0 ↔ T1 ↔ T2 ↔ T3 ↔ T4 
Discovery to Application Guideline to Practice to 
Application to Guideline Practice Population 
Khoury MJ, 2007, Schully, 2012. Clyne, M, 2014 
Health 
Impact 
<1% of published genomics research 
in T2 – T4 
Multiple clinical and population 
scientific disciplines involved
Cancer Genomics Research Funding T2+ 
Public Health Genomics 2010
A MultiDisciplinary T2+ Research Agenda 
 Comparative Effectiveness Research 
 Patient-centered Outcomes Research 
 Behavioral, Social & Communication Sciences 
 Economic Studies 
 Surveillance & Population Monitoring
A Public Health Approach to Realizing 
Promises of Genomics & Big Data 
 Use a Strong Epidemiologic Foundation 
 Develop a Robust Knowledge Integration 
Process 
 Use (and not avoid) Principles of Evidence-based 
Medicine and Population Screening 
 Develop a Robust T2+ Research Agenda 
(Learning Health systems, Consumer 
Involvement etc..)
In Summary 
 “Big Data” is agnostic to disease causation 
 Numerous promises for health impact of genomics 
& Big Data- Leading edge in genomics in Big Data 
beginning to be applied 
 But numerous challenges face genomics & Big 
Data. So we should not overpromise & under 
deliver 
 A “Public Health” translational approach Is needed 
to realize potential of genomics & Big Data

Weitere ähnliche Inhalte

Was ist angesagt?

Patient Centered Care | Unit 8a Lecture
Patient Centered Care | Unit 8a LecturePatient Centered Care | Unit 8a Lecture
Patient Centered Care | Unit 8a LectureCMDLMS
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014Warren Kibbe
 
Precision Medicine: Research Presentation
Precision Medicine: Research PresentationPrecision Medicine: Research Presentation
Precision Medicine: Research PresentationShelagh McLellan
 
Clinical Trial Accrual Challenges: Is Social Media Here to Help? (A. Denicoff)
Clinical Trial Accrual Challenges: Is Social Media Here to Help? (A. Denicoff)Clinical Trial Accrual Challenges: Is Social Media Here to Help? (A. Denicoff)
Clinical Trial Accrual Challenges: Is Social Media Here to Help? (A. Denicoff)Esmeralda Casas-Silva, Ph.D.
 
[ASGO 2019] Artificial Intelligence in Medicine
[ASGO 2019] Artificial Intelligence in Medicine[ASGO 2019] Artificial Intelligence in Medicine
[ASGO 2019] Artificial Intelligence in MedicineYoon Sup Choi
 
[대한병리학회] 의료 인공지능 101: 병리를 중심으로
[대한병리학회] 의료 인공지능 101: 병리를 중심으로[대한병리학회] 의료 인공지능 101: 병리를 중심으로
[대한병리학회] 의료 인공지능 101: 병리를 중심으로Yoon Sup Choi
 
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Conference – iHT2
 
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...Jerry Lee
 
Support & information for radiotherapy patients: how can social media help?
Support & information for radiotherapy patients: how can social media help?Support & information for radiotherapy patients: how can social media help?
Support & information for radiotherapy patients: how can social media help?KimMeeking
 
Treated by Computers?- a futuristic perspective of health care
Treated by Computers?- a futuristic perspective of health careTreated by Computers?- a futuristic perspective of health care
Treated by Computers?- a futuristic perspective of health careKatarzyna Wac & The QoL Lab
 
coQoL Approach: coCalibrating Physical and Psychological Outcomes & Consumer...
coQoL Approach: coCalibrating Physical and Psychological Outcomes  & Consumer...coQoL Approach: coCalibrating Physical and Psychological Outcomes  & Consumer...
coQoL Approach: coCalibrating Physical and Psychological Outcomes & Consumer...Katarzyna Wac & The QoL Lab
 
Obeid generic_2017-11
Obeid generic_2017-11Obeid generic_2017-11
Obeid generic_2017-11Jihad Obeid
 
Welcome Remarks and Overview of CTSI Resources
Welcome Remarks and Overview of CTSI ResourcesWelcome Remarks and Overview of CTSI Resources
Welcome Remarks and Overview of CTSI ResourcesUCLA CTSI
 
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...Katarzyna Wac & The QoL Lab
 
Big Data and the Future by Sherri Rose
Big Data and the Future by Sherri RoseBig Data and the Future by Sherri Rose
Big Data and the Future by Sherri RoseLewis Lin 🦊
 
ONLINE FUZZY-LOGIC KNOWLEDGE WAREHOUSING AND MINING MODEL FOR THE DIAGNOSIS A...
ONLINE FUZZY-LOGIC KNOWLEDGE WAREHOUSING AND MINING MODEL FOR THE DIAGNOSIS A...ONLINE FUZZY-LOGIC KNOWLEDGE WAREHOUSING AND MINING MODEL FOR THE DIAGNOSIS A...
ONLINE FUZZY-LOGIC KNOWLEDGE WAREHOUSING AND MINING MODEL FOR THE DIAGNOSIS A...ijcsity
 
인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (상)
인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (상)인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (상)
인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (상)Yoon Sup Choi
 

Was ist angesagt? (20)

Precision Medicine
Precision Medicine Precision Medicine
Precision Medicine
 
Patient Centered Care | Unit 8a Lecture
Patient Centered Care | Unit 8a LecturePatient Centered Care | Unit 8a Lecture
Patient Centered Care | Unit 8a Lecture
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014
 
Precision Medicine: Research Presentation
Precision Medicine: Research PresentationPrecision Medicine: Research Presentation
Precision Medicine: Research Presentation
 
Clinical Trial Accrual Challenges: Is Social Media Here to Help? (A. Denicoff)
Clinical Trial Accrual Challenges: Is Social Media Here to Help? (A. Denicoff)Clinical Trial Accrual Challenges: Is Social Media Here to Help? (A. Denicoff)
Clinical Trial Accrual Challenges: Is Social Media Here to Help? (A. Denicoff)
 
[ASGO 2019] Artificial Intelligence in Medicine
[ASGO 2019] Artificial Intelligence in Medicine[ASGO 2019] Artificial Intelligence in Medicine
[ASGO 2019] Artificial Intelligence in Medicine
 
Fjms - keynote at MIE 2015
Fjms - keynote at MIE 2015Fjms - keynote at MIE 2015
Fjms - keynote at MIE 2015
 
[대한병리학회] 의료 인공지능 101: 병리를 중심으로
[대한병리학회] 의료 인공지능 101: 병리를 중심으로[대한병리학회] 의료 인공지능 101: 병리를 중심으로
[대한병리학회] 의료 인공지능 101: 병리를 중심으로
 
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
 
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
 
Support & information for radiotherapy patients: how can social media help?
Support & information for radiotherapy patients: how can social media help?Support & information for radiotherapy patients: how can social media help?
Support & information for radiotherapy patients: how can social media help?
 
Treated by Computers?- a futuristic perspective of health care
Treated by Computers?- a futuristic perspective of health careTreated by Computers?- a futuristic perspective of health care
Treated by Computers?- a futuristic perspective of health care
 
coQoL Approach: coCalibrating Physical and Psychological Outcomes & Consumer...
coQoL Approach: coCalibrating Physical and Psychological Outcomes  & Consumer...coQoL Approach: coCalibrating Physical and Psychological Outcomes  & Consumer...
coQoL Approach: coCalibrating Physical and Psychological Outcomes & Consumer...
 
Obeid generic_2017-11
Obeid generic_2017-11Obeid generic_2017-11
Obeid generic_2017-11
 
Welcome Remarks and Overview of CTSI Resources
Welcome Remarks and Overview of CTSI ResourcesWelcome Remarks and Overview of CTSI Resources
Welcome Remarks and Overview of CTSI Resources
 
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...
 
Big Data and the Future by Sherri Rose
Big Data and the Future by Sherri RoseBig Data and the Future by Sherri Rose
Big Data and the Future by Sherri Rose
 
ONLINE FUZZY-LOGIC KNOWLEDGE WAREHOUSING AND MINING MODEL FOR THE DIAGNOSIS A...
ONLINE FUZZY-LOGIC KNOWLEDGE WAREHOUSING AND MINING MODEL FOR THE DIAGNOSIS A...ONLINE FUZZY-LOGIC KNOWLEDGE WAREHOUSING AND MINING MODEL FOR THE DIAGNOSIS A...
ONLINE FUZZY-LOGIC KNOWLEDGE WAREHOUSING AND MINING MODEL FOR THE DIAGNOSIS A...
 
인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (상)
인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (상)인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (상)
인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (상)
 

Ähnlich wie Khoury ashg2014

Secure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH ViewSecure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH ViewPhilip Bourne
 
National Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingNational Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingWarren Kibbe
 
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...Philip Bourne
 
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingDay 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingWarren Kibbe
 
Sun==big data analytics for health care
Sun==big data analytics for health careSun==big data analytics for health care
Sun==big data analytics for health careAravindharamanan S
 
Challenges of Harnessing the Informatics Landscape to Promote Health Behavior...
Challenges of Harnessing the Informatics Landscape to Promote Health Behavior...Challenges of Harnessing the Informatics Landscape to Promote Health Behavior...
Challenges of Harnessing the Informatics Landscape to Promote Health Behavior...AmericanLegacyFoundation
 
Improving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceImproving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceWessel Kraaij
 
Early diagnosis and prevention enabled by big data   geneva conference final
Early diagnosis and prevention enabled by big data   geneva conference finalEarly diagnosis and prevention enabled by big data   geneva conference final
Early diagnosis and prevention enabled by big data   geneva conference finale-Marefa
 
The Cochrane Collaboration Colloquium: The Human Genome Epidemiology Network:...
The Cochrane Collaboration Colloquium: The Human Genome Epidemiology Network:...The Cochrane Collaboration Colloquium: The Human Genome Epidemiology Network:...
The Cochrane Collaboration Colloquium: The Human Genome Epidemiology Network:...Cochrane.Collaboration
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 
Genomics-health, Pharmacogenomics.pdf
Genomics-health, Pharmacogenomics.pdfGenomics-health, Pharmacogenomics.pdf
Genomics-health, Pharmacogenomics.pdfshinycthomas
 
Patient Centered Care | Unit 8b Lecture
Patient Centered Care | Unit 8b LecturePatient Centered Care | Unit 8b Lecture
Patient Centered Care | Unit 8b LectureCMDLMS
 
Data Commons & Data Science Workshop
Data Commons & Data Science WorkshopData Commons & Data Science Workshop
Data Commons & Data Science WorkshopWarren Kibbe
 

Ähnlich wie Khoury ashg2014 (20)

Secure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH ViewSecure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH View
 
Dr. Obumneke Amadi _Transcript
Dr. Obumneke Amadi _TranscriptDr. Obumneke Amadi _Transcript
Dr. Obumneke Amadi _Transcript
 
Integrated health monitoring
Integrated health monitoringIntegrated health monitoring
Integrated health monitoring
 
From Research to Practice: New Models for Data-sharing and Collaboration to I...
From Research to Practice: New Models for Data-sharing and Collaboration to I...From Research to Practice: New Models for Data-sharing and Collaboration to I...
From Research to Practice: New Models for Data-sharing and Collaboration to I...
 
Naf ppt
Naf pptNaf ppt
Naf ppt
 
National Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingNational Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data Sharing
 
Big data for health
Big data for healthBig data for health
Big data for health
 
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
 
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingDay 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
 
Sun==big data analytics for health care
Sun==big data analytics for health careSun==big data analytics for health care
Sun==big data analytics for health care
 
Challenges of Harnessing the Informatics Landscape to Promote Health Behavior...
Challenges of Harnessing the Informatics Landscape to Promote Health Behavior...Challenges of Harnessing the Informatics Landscape to Promote Health Behavior...
Challenges of Harnessing the Informatics Landscape to Promote Health Behavior...
 
Improving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceImproving health care outcomes with responsible data science
Improving health care outcomes with responsible data science
 
Day 1: Real-World Data Panel
Day 1: Real-World Data Panel Day 1: Real-World Data Panel
Day 1: Real-World Data Panel
 
Early diagnosis and prevention enabled by big data   geneva conference final
Early diagnosis and prevention enabled by big data   geneva conference finalEarly diagnosis and prevention enabled by big data   geneva conference final
Early diagnosis and prevention enabled by big data   geneva conference final
 
The Cochrane Collaboration Colloquium: The Human Genome Epidemiology Network:...
The Cochrane Collaboration Colloquium: The Human Genome Epidemiology Network:...The Cochrane Collaboration Colloquium: The Human Genome Epidemiology Network:...
The Cochrane Collaboration Colloquium: The Human Genome Epidemiology Network:...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Genomics-health, Pharmacogenomics.pdf
Genomics-health, Pharmacogenomics.pdfGenomics-health, Pharmacogenomics.pdf
Genomics-health, Pharmacogenomics.pdf
 
1-s2.0-S0167923620300944-main.pdf
1-s2.0-S0167923620300944-main.pdf1-s2.0-S0167923620300944-main.pdf
1-s2.0-S0167923620300944-main.pdf
 
Patient Centered Care | Unit 8b Lecture
Patient Centered Care | Unit 8b LecturePatient Centered Care | Unit 8b Lecture
Patient Centered Care | Unit 8b Lecture
 
Data Commons & Data Science Workshop
Data Commons & Data Science WorkshopData Commons & Data Science Workshop
Data Commons & Data Science Workshop
 

Khoury ashg2014

  • 1. Separating Signal from Noise in the Age of Genomics & Big Data: A Public Health Approach Muin J. Khoury MD, PhD CDC Office of Public Health Genomics NCI Epidemiology & Genomics Research Program
  • 2. Outline  Big Data & Causation in the Age of Genomics  Promises of Genomics & Big Data  Challenges of Genomics & Big Data  A Public Health Approach to Realize Potential of Genomics & Big Data
  • 3. A Case Study: Searching for Needles in the Haystack- The CDC HuGE Navigator http://www.hugenavigator.net/HuGENavigator/home.do
  • 4. Text Mining Tool To Find HuGE Articles in Published Literature  PubMed Signal/Noise ratio very low  Support Vector Machine (SVM) tool generated in 2008  Based on >3800 words in text, extensively validated  Sensitivity & specificity >97%  Since 2008, genetic epidemiology literature has changed considerably  Performance of SVM model was significantly reduced (60%)  In 2014, Retrained SVM now using > 4500 words pushed sensitivity and specificity to >90% Yu W et al. BMC Bioinformatics, 2008
  • 5. Application of Data Mining in the Prediction of Type 2 Diabetes in the United States  1999-2004 National Health and Nutrition Examination Survey  Developed and validated SVM models for diabetes, undiagnosed diabetes & prediabetes using numerous variables in survey  Discriminative abilities Using area under ROC curve of 84% and 73%  Validated known risk factors for diabetes  Not clear what best models, what best variables to use and how applicable to other populations  Proof of concept only Yu W et al. BMC Medical Informatics 2010
  • 6. The IOM Ecological Model & the Need for Multilevel Analysis of “Causation” Obesity Example NEJM 2007;357:404-7 IOM Ecological Model
  • 7. Genomics & Big Data The Genome is Just the Beginning “We will all be surrounded by a personal cloud of billions of data pointsl“ L Hood (ISB)
  • 8. Big Data: From Association to Prediction How about Causation?  Association  Replication  Classification  Prediction  ?CAUSATION  Does Big Data care about “Causation”?  Intervention is based on cause-effect relationships
  • 9. The Promises of Genomics & Big Data The Economist
  • 10. The Promises of Genomics & Big Data  Workup of Rare & Familial Diseases NEJM June2014
  • 11. The Promises of Genomics & Big Data  Improved Disease Classification
  • 12. The Promises of Genomics & Big Data  Improved Measurement of the “Environment” http://www.niehs.nih.gov/research/programs/geh/geh_newsletter/2014/4/spotlight/index.cfm
  • 13. The Promises of Genomics & Big Data  Better Understanding of Natural History G Ginsburg
  • 14. The Promises of Genomics & Big Data  Stratified Prevention (One size does not fit All) No one is average: “population medicine: let’s get over it” (E. Topol)
  • 15. The Promises of Genomics & Big Data  Precision Medicine
  • 16. The Promises of Genomics & Big Data  Pathogen Genomics
  • 17. The Promises of Genomics & Big Data  Public Health Practice “As cholera swept through London in the mid-19th century, a physician named John Snow painstakingly drew a paper map indicating clusters of homes where the deadly waterborne infection had struck. In an iconic feat in public health history, he implicated the Broad Street pump as the source of the scourge—a founding event in modern epidemiology. Today, Snow might have crunched GPS information and disease prevalence data and solved the problem within hours” http://www.hsph.harvard.edu/news/magazine/big-datas-big-visionary/? utm_source=SilverpopMailing&utm_medium=email&utm_cam paign=Kiosk%2009.25.14_academic%20(1)&utm_content
  • 18. Some Promises of Genomics & Big Data  Workup of Rare & Familial Diseases  Improved Disease Classification  Improved Measurement of the “Environment”  Better Understanding of Disease Natural History  Stratified Prevention  Precision Medicine  Pathogen Genomics  Public Health Practice
  • 19. The Challenges of Genomics & Big Data  Problems of Study Designs & Hidden Biases “…claims are based upon complex (and we believe flawed) analyses…there are far simpler alternative explanations for the patterns they observed. We believe that the authors have not excluded important alternative explanations“ G. Breen Schizophrenia is Eight Different Diseases Not One” USA Today (9/15/2014) “Eight types of schizophrenia? Not so fast” Genomes Unzipped (9/30/2014) Am J Psychiatry Sep 2014
  • 20.
  • 21. The Challenges of Genomics & Big Data  Analytic Issues: Dealing with Complexity Prediction of LDL cholesterol response to statin using transcriptomic and genetic variation. Kyungpil Kim et al. Genome Biology, Sep 2014
  • 22. The Challenges of Genomics & Big Data  Reproducibility Lots of Input Variables Molecularly defined Disease subsets & precursors Millions of genetic variants
  • 23. Am J Clin Nutrition 2013
  • 24. The Challenges of Genomics & Big Data  Causation, Ecologic Fallacies & Hubris
  • 25. ‘The Scientific Method Itself is Growing Obsolete.’ (A. Butte, Sep 2014) “..implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis." http://blogs.kqed.org/science/ audio/how-big-data-is-changing- medicine/ Garbage In, Garbage Out (GIGO)
  • 26. The Challenges of Genomics & Big Data  Beyond Prediction: From Validity to Utility
  • 27. The Challenges of Genomics & Big Data  Challenges of Population Stratification & Precision Medicine
  • 28. Some Challenges of Genomics & Big Data  Problems of Study Designs & Hidden Biases  Analytic Issues: Dealing with Complexity  Reproducibility and Replication  Causation vs Association-Ecologic Fallacies & Hubris  Translation: from Validity into Utility and Implementation  Challenges of Population Stratification & Personalized Medicine
  • 29. A Public Health Translation Framework for Genomics & Big Data Population Health Discovery Evaluation Evidence based Recommendation or Policy T1 Health care & Prevention Programs Application Knowledge Integration T2 T4 T3 T0 Implementation Science Khoury MJ et al, AJPH, 2012 Effectiveness & Outcomes Research (CER, PCOR. Economics, ELSI Development Basic, Clinical & Population Sciences
  • 30. A Public Health Approach to Realizing Promises of Genomics & Big Data  1. Use a Strong Epidemiologic Foundation  The study of distribution and determinants of disease occurrence and outcomes in populations, and using resulting knowledge to improve health and prevent disease  Fundamental science of medicine and public health  Human Genome Epidemiology (HuGE)- Beyond Gene Discovery  New Brand of “Big Data Epidemiology” 2010
  • 31.
  • 32. Epidemiologic Cohort Studies: The NCI Cohort Consortium • Investigators responsible: – 40+ high-quality cohorts – 4+ million people • Coordinated, interdisciplinary approach • Tackle important scientific questions, economies of scale, and opportunities to quicken the pace of research • Focused so far mostly on etiology, but adapting to include outcomes • Major role in identifying specific carcinogenic environment agents ▫ Asbestos – Lung ▫ Benzene – Leukemia ▫ Smoking – many dzs • Exposures/Risk factors assessment prior to onset of disease ▫ Overcome recall/selection biases • Permit absolute measures of risks/incidence rates ▫ Relevant for public health policies • Value resource for studying for repeated measures and multiple outcomes
  • 33. Epidemiology Data Sharing & Harmonization Nature, August 27, 2014
  • 34. A Public Health Approach to Realizing Promises of Genomics & Big Data  2. Develop a Robust Knowledge Integration Process
  • 35. A Public Health Approach to Realizing Promises of Genomics & Big Data  2. Develop a Robust Knowledge Integration Process
  • 36. Components of Knowledge Integration • Knowledge Management: Integration of knowledge from disparate sources & disciplines • Knowledge Synthesis: Systematic synthesis of scientific findings ▫ Accumulating evidence on a cancer outcome Minimize waste in repeat funding ▫ Identify scientific gaps Inform research priorities • Knowledge Translation ▫ Stakeholder engagement ▫ Evidence-based information ▫ Decision support tools
  • 37. Interpretation “The Bottleneck for Realizing Personalized Medicine” (Good et al. Genome Biology Sep 2014)
  • 38. The NIH BD2K Initiative Can Help
  • 39. A Public Health Approach to Realizing Promises of Genomics & Big Data  3. Use (and not avoid) Principles of Evidence-based Medicine and Population Screening
  • 40. Guidelines We Can Trust (IOM, 2011)
  • 41. Guidelines We Can Trust in Genomic Medicine (Schully S et al. Genetics in Medicine 2014)
  • 42. CDC-Sponsored EGAPP Working Group • Independent, multidisciplinary, non-federal panel established in 2004 • Established a systematic, evidence-based process to assess validity & utility of genomic tests & family health history applications. • New methods for evidence synthesis and modeling in 2013, including next generation sequencing and stratified cancer screening based on family history • 10 recommendation statements to date: • Colorectal cancer, breast cancer, heart disease, clotting disorders, depression, prostate cancer, diabetes, and more • Clinical Validity vs Clinical Utility • Uncovered evidence gaps that require additional research • Principles can be applied to other “Big Data”
  • 43. Evidence-based Classification of Genomic Applications in Practice Tier 1 Tier 2 Tier 3 http://www.cdc.gov/genomics/gtesting/tier.htm
  • 44. Evidence-based Binning of the Genome Genetics in Medicine 2011
  • 45. A Public Health Approach to Realizing Promises of Genomics & Big Data  4. Develop a Robust T2+ Translational Research Agenda
  • 46. Limited Translational Research in Genomics Beyond the Bedside T0 ↔ T1 ↔ T2 ↔ T3 ↔ T4 Discovery to Application Guideline to Practice to Application to Guideline Practice Population Khoury MJ, 2007, Schully, 2012. Clyne, M, 2014 Health Impact <1% of published genomics research in T2 – T4 Multiple clinical and population scientific disciplines involved
  • 47. Cancer Genomics Research Funding T2+ Public Health Genomics 2010
  • 48. A MultiDisciplinary T2+ Research Agenda  Comparative Effectiveness Research  Patient-centered Outcomes Research  Behavioral, Social & Communication Sciences  Economic Studies  Surveillance & Population Monitoring
  • 49. A Public Health Approach to Realizing Promises of Genomics & Big Data  Use a Strong Epidemiologic Foundation  Develop a Robust Knowledge Integration Process  Use (and not avoid) Principles of Evidence-based Medicine and Population Screening  Develop a Robust T2+ Research Agenda (Learning Health systems, Consumer Involvement etc..)
  • 50. In Summary  “Big Data” is agnostic to disease causation  Numerous promises for health impact of genomics & Big Data- Leading edge in genomics in Big Data beginning to be applied  But numerous challenges face genomics & Big Data. So we should not overpromise & under deliver  A “Public Health” translational approach Is needed to realize potential of genomics & Big Data