SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Big Data and its Role in
Biomedical Research
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
10/10/18 ACoP 2018 1
@pebourne
Bias
• Cant help but be influenced by my time as Associate
Director for Data Science (ADDS) at NIH
• Now very much engaged in data science across disciplines
– broader but shallower perspective
• Knowing my long-time colleague Prof. Lei Xie and others
will follow me with a deeper perspective
10/10/18 ACoP 2018 2
Lets start with a definition ….
10/10/18 ACoP 2018 3
Big data and data
science are like
the Internet…
If I asked you to
define them you
would all say
something
different, yet you
use them every
day…
10/10/18 ACoP 2018 4
http://vadlo.com/cartoons.php?id=357
So what do I mean by big data/data
science?
• Use of the ever increasing amount of open, complex, diverse
digital data
• Finding ways to ask and then answer relevant questions by
combining such diverse data sets
• Arriving at statistically significant conclusions not otherwise
obtainable
• Sharing such findings in a useful way
• Translating such findings into actions that improve the human
condition
10/10/18 ACoP 2018 5
Model
Transportability
Horizontal
Integration
Multi-scale
Integration
human
mouse
zebrafish
DNA
Gene/Protein
Network
Cell
Tissue
Organ
Body
Population
CNV SNP methylation
3D structure Gene
expression Proteomics
Metabolomics
MetabolicSignaling
transduction
Gene
regulation
Hepatic Myoepithelial Erythrocyte
Epithelial Muscle Nervous
Liver Kidney Pancreas Heart
Physiologically based
pharmacokinetics
GWASPopulation
dynamics
Microbiota
QSP - Open, complex, diverse digital data
Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262
10/10/18 6
Machine learning has been around for over 20
years – why the fuss now?
• Amount of data available for training
• Open source - R and python
• Advances in computing (e.g., GPU’s) allow for deeper neural nets (deep
learning)
• Algorithmic efficiency gains (e.g., in back propagation)
• Success promotes further research
• Commercialization
10/10/18 ACoP 2018 7
Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
The NIH view
• Big Data
– Total data from NIH-funded research in 2016 estimated at 650 PB*
– 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10
PB in 2016
• Dark Data
– Only 12% of data described in published papers is in recognized
archives – 88% is dark data^
• Cost
– 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data
archives
* In 2012 Library of Congress was 3 PB
^ http://www.ncbi.nlm.nih.gov/pubmed/26207759
10/10/18 ACoP 2018 8
NIH strategic plan for data
• Support a Highly Efficient and Effective
Biomedical Research Data
Infrastructure
• Promote Modernization of the Data-
Resources Ecosystem
• Support the Development and
Dissemination of Advanced Data
Management, Analytics, and
Visualization Tools
• Enhance Workforce Development for
Biomedical Data Science
• Enact Appropriate Policies to Promote
Stewardship and Sustainability
10/10/18 ACoP 2018 9
https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
A research data infrastructure requires
we move from pipes to platform…
which begs the question ...
10/10/18 ACoP 2018 10
Vivien Bonazzi Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
Will biomedical research become more like Airbnb?
I am not crazy, hear me out
• Airbnb is a platform that supports a trusted relationship between consumer
(renter) and supplier (host)
• The platform focuses on maximizing the exchange of services between supplier and
consumer and maximizing the amount of trust associated with a given stakeholder
• It seems to be working:
– 60 million users searching 2 million listings in 192 countries
– Average of 500,000 stays per night.
– Evaluation of US $25bn
10/10/18 ACoP 2018 11
Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
Cloud computing
environment
data
metadata
software
model
container
Metadata
Model Commons
Model Commons
Recommendation
System
Model
registry
User interface
(A) (B)
(C)
ontology
model
data
algorithm
software
These plans require moving from pipes to platforms
10/10/18 ACoP 2018 12
The pillars of data science operate
within this platform environment
13
QSP
10/10/18 ACoP 2018
Lets briefly focus on those five pillars
in the Context of QSP …
10/10/18 ACoP 2018 14
Data acquisition
The data production issue (the V’s of Big Data)— Experimentally
• Estimated (2017) that ≈2.5 quintillion (2.5×1018) bytes of data generated daily, with 90%
of all the world’s data having been created in the past two years.
• Plaintext PDB files typically ≈ few 100s KB (…but, that’s just the start!)
Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
10/10/18 ACoP 2018 15
Data integration and engineering
• Generic
– Ontologies
– Object identifiers
– Indexing schemes
– Common data models
1610/10/18 ACoP 2018
Data analytics
17
• Generic
–SVM’s
–Neural nets
–Deep learning
–Random forest
10/10/18 ACoP 2018
Visualization
• Generic
– VR
– Networks
– Sonics
1810/10/18 ACoP 2018
Ethics, law & policy
10/10/18 ACoP 2018 19
• Landmark studies identify
histone mutations as
recurrent driver mutations in
DIPG ~2012
• Almost 3 years later, in
largely the same datasets,
but partially expanded, the
same two groups and 2
others identify ACVR1
mutations as a secondary,
co-occurring mutation
From Adam Resnick
Diffuse Intrinsic Pontine Glioma (DIDG)
Conclusion:
Driven by large amounts of open
digital data of different types and new
algorithms and approaches biomedical
researchers are destined to follow the
private sector towards the fourth
paradigm
10/10/18 ACoP 2018 20
Acknowledgements
10/10/18 ACoP 2018 21
The BD2K Team at NIH
My Colleagues at UVA
The 150 folks who have passed through my laboratory
https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
Zheng Zhao Lei Xie
Thank You
peb6a@virginia.edu
2210/10/18 ACoP 2018

Weitere ähnliche Inhalte

Was ist angesagt?

A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics Ranjith Raj V
 
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...Varij Nayan
 
Applications of genomics and proteomics ppt
Applications of genomics and  proteomics pptApplications of genomics and  proteomics ppt
Applications of genomics and proteomics pptIbad khan
 
Xenorenotransoplantation 1
Xenorenotransoplantation 1Xenorenotransoplantation 1
Xenorenotransoplantation 1Vishal Ramteke
 
Eugenics presentation farees mufti
Eugenics presentation farees muftiEugenics presentation farees mufti
Eugenics presentation farees muftiFarees Khattak Mufti
 
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentSuchittaU
 
Biobanking for better research and healthcare
Biobanking for better research and healthcareBiobanking for better research and healthcare
Biobanking for better research and healthcaretbrc
 
Challenges and opportunities in personal omics profiling
Challenges and opportunities in personal omics profilingChallenges and opportunities in personal omics profiling
Challenges and opportunities in personal omics profilingSenthil Natesan
 
Genomic and Personalized Medicine: An Overview
Genomic and Personalized Medicine: An OverviewGenomic and Personalized Medicine: An Overview
Genomic and Personalized Medicine: An OverviewVSee
 
Somatic cell nuclear transfer ppt
Somatic cell nuclear transfer pptSomatic cell nuclear transfer ppt
Somatic cell nuclear transfer ppt15morgpe
 
Recent trends in bioinformatics
Recent trends in bioinformaticsRecent trends in bioinformatics
Recent trends in bioinformaticsZeeshan Hanjra
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in BioinformaticsAli Kishk
 

Was ist angesagt? (20)

Animal cloning
Animal cloningAnimal cloning
Animal cloning
 
A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics
 
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
 
Applications of genomics and proteomics ppt
Applications of genomics and  proteomics pptApplications of genomics and  proteomics ppt
Applications of genomics and proteomics ppt
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
Xenorenotransoplantation 1
Xenorenotransoplantation 1Xenorenotransoplantation 1
Xenorenotransoplantation 1
 
Eugenics presentation farees mufti
Eugenics presentation farees muftiEugenics presentation farees mufti
Eugenics presentation farees mufti
 
Biobanking
BiobankingBiobanking
Biobanking
 
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and development
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
Environmental Genomics
Environmental GenomicsEnvironmental Genomics
Environmental Genomics
 
3D cell cultures
3D cell cultures3D cell cultures
3D cell cultures
 
Biobanking for better research and healthcare
Biobanking for better research and healthcareBiobanking for better research and healthcare
Biobanking for better research and healthcare
 
Challenges and opportunities in personal omics profiling
Challenges and opportunities in personal omics profilingChallenges and opportunities in personal omics profiling
Challenges and opportunities in personal omics profiling
 
Animal Cloning
Animal CloningAnimal Cloning
Animal Cloning
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
Genomic and Personalized Medicine: An Overview
Genomic and Personalized Medicine: An OverviewGenomic and Personalized Medicine: An Overview
Genomic and Personalized Medicine: An Overview
 
Somatic cell nuclear transfer ppt
Somatic cell nuclear transfer pptSomatic cell nuclear transfer ppt
Somatic cell nuclear transfer ppt
 
Recent trends in bioinformatics
Recent trends in bioinformaticsRecent trends in bioinformatics
Recent trends in bioinformatics
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in Bioinformatics
 

Ähnlich wie Big Data and its Role in Biomedical Research

UK data management environment and support
UK data management environment and supportUK data management environment and support
UK data management environment and supportJisc
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Philip Bourne
 
Big Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical EngineeringBig Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical EngineeringPhilip Bourne
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedPhilip Bourne
 
Research Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the PolicyResearch Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the PolicyTorsten Reimer
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Robin Rice
 
Implications of the Fourth Paradigm
Implications of the Fourth ParadigmImplications of the Fourth Paradigm
Implications of the Fourth ParadigmPhilip Bourne
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
 
BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020Philip Bourne
 
Towards a Data Commons
Towards a Data CommonsTowards a Data Commons
Towards a Data CommonsMichael Becich
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?Philip Bourne
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Juan Antonio Vizcaino
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in librariesC. Tobin Magle
 
The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?Philip Bourne
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?Philip Bourne
 
Research Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research PracticeResearch Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research PracticeYasar Tonta
 
Institutional Data Management Blueprint
Institutional Data Management BlueprintInstitutional Data Management Blueprint
Institutional Data Management BlueprintEduserv
 

Ähnlich wie Big Data and its Role in Biomedical Research (20)

UK data management environment and support
UK data management environment and supportUK data management environment and support
UK data management environment and support
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
 
Big Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical EngineeringBig Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical Engineering
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science Aligned
 
Research Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the PolicyResearch Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the Policy
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
Implications of the Fourth Paradigm
Implications of the Fourth ParadigmImplications of the Fourth Paradigm
Implications of the Fourth Paradigm
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 
BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020
 
Towards a Data Commons
Towards a Data CommonsTowards a Data Commons
Towards a Data Commons
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 
The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?
 
Open Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon Hodson
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?
 
Research Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research PracticeResearch Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research Practice
 
Institutional Data Management Blueprint
Institutional Data Management BlueprintInstitutional Data Management Blueprint
Institutional Data Management Blueprint
 

Mehr von Philip Bourne

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationPhilip Bourne
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingPhilip Bourne
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityPhilip Bourne
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?Philip Bourne
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug DiscoveryPhilip Bourne
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchPhilip Bourne
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data SciencePhilip Bourne
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewPhilip Bourne
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptxPhilip Bourne
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Philip Bourne
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision EducationPhilip Bourne
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Philip Bourne
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Philip Bourne
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance SustainabilityPhilip Bourne
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesPhilip Bourne
 

Mehr von Philip Bourne (20)

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance Sustainability
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
 

Kürzlich hochgeladen

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 

Kürzlich hochgeladen (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 

Big Data and its Role in Biomedical Research

  • 1. Big Data and its Role in Biomedical Research Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 10/10/18 ACoP 2018 1 @pebourne
  • 2. Bias • Cant help but be influenced by my time as Associate Director for Data Science (ADDS) at NIH • Now very much engaged in data science across disciplines – broader but shallower perspective • Knowing my long-time colleague Prof. Lei Xie and others will follow me with a deeper perspective 10/10/18 ACoP 2018 2
  • 3. Lets start with a definition …. 10/10/18 ACoP 2018 3
  • 4. Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… 10/10/18 ACoP 2018 4 http://vadlo.com/cartoons.php?id=357
  • 5. So what do I mean by big data/data science? • Use of the ever increasing amount of open, complex, diverse digital data • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 10/10/18 ACoP 2018 5
  • 6. Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3D structure Gene expression Proteomics Metabolomics MetabolicSignaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWASPopulation dynamics Microbiota QSP - Open, complex, diverse digital data Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262 10/10/18 6
  • 7. Machine learning has been around for over 20 years – why the fuss now? • Amount of data available for training • Open source - R and python • Advances in computing (e.g., GPU’s) allow for deeper neural nets (deep learning) • Algorithmic efficiency gains (e.g., in back propagation) • Success promotes further research • Commercialization 10/10/18 ACoP 2018 7 Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
  • 8. The NIH view • Big Data – Total data from NIH-funded research in 2016 estimated at 650 PB* – 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB in 2016 • Dark Data – Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost – 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 10/10/18 ACoP 2018 8
  • 9. NIH strategic plan for data • Support a Highly Efficient and Effective Biomedical Research Data Infrastructure • Promote Modernization of the Data- Resources Ecosystem • Support the Development and Dissemination of Advanced Data Management, Analytics, and Visualization Tools • Enhance Workforce Development for Biomedical Data Science • Enact Appropriate Policies to Promote Stewardship and Sustainability 10/10/18 ACoP 2018 9 https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
  • 10. A research data infrastructure requires we move from pipes to platform… which begs the question ... 10/10/18 ACoP 2018 10 Vivien Bonazzi Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818. Will biomedical research become more like Airbnb?
  • 11. I am not crazy, hear me out • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: – 60 million users searching 2 million listings in 192 countries – Average of 500,000 stays per night. – Evaluation of US $25bn 10/10/18 ACoP 2018 11 Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
  • 12. Cloud computing environment data metadata software model container Metadata Model Commons Model Commons Recommendation System Model registry User interface (A) (B) (C) ontology model data algorithm software These plans require moving from pipes to platforms 10/10/18 ACoP 2018 12
  • 13. The pillars of data science operate within this platform environment 13 QSP 10/10/18 ACoP 2018
  • 14. Lets briefly focus on those five pillars in the Context of QSP … 10/10/18 ACoP 2018 14
  • 15. Data acquisition The data production issue (the V’s of Big Data)— Experimentally • Estimated (2017) that ≈2.5 quintillion (2.5×1018) bytes of data generated daily, with 90% of all the world’s data having been created in the past two years. • Plaintext PDB files typically ≈ few 100s KB (…but, that’s just the start!) Mura et al. 2018 Curr Opin Struct Biol. 52:95-102 10/10/18 ACoP 2018 15
  • 16. Data integration and engineering • Generic – Ontologies – Object identifiers – Indexing schemes – Common data models 1610/10/18 ACoP 2018
  • 17. Data analytics 17 • Generic –SVM’s –Neural nets –Deep learning –Random forest 10/10/18 ACoP 2018
  • 18. Visualization • Generic – VR – Networks – Sonics 1810/10/18 ACoP 2018
  • 19. Ethics, law & policy 10/10/18 ACoP 2018 19 • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co-occurring mutation From Adam Resnick Diffuse Intrinsic Pontine Glioma (DIDG)
  • 20. Conclusion: Driven by large amounts of open digital data of different types and new algorithms and approaches biomedical researchers are destined to follow the private sector towards the fourth paradigm 10/10/18 ACoP 2018 20
  • 21. Acknowledgements 10/10/18 ACoP 2018 21 The BD2K Team at NIH My Colleagues at UVA The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0 Zheng Zhao Lei Xie

Hinweis der Redaktion

  1. Model integration in systems pharmacology. Diverse models need to be integrated across multiple methodologies, multiple heterogeneous data sets, organismal hierarchy, and species (transportability).
  2. $1.25bn per year to capture all data. After a significant effort at reduction, intramurally data is spread across > 60 data centers; imagine the extramural situation.
  3. Distribution of kinases and the number of covalent small-molecule kinase inhibitors (CSKIs) for every targeted kinase across the human kinome
  4. 22