SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Data Science Meets Structural
Biology
Philip E. Bourne, Cam Mura & Eli Draizen
(Open Team Science)
https://www.slideshare.net/pebourne
08/31/18 DSI Lunch & Learn 1
https://arxiv.org/abs/1807.09247
We are more interested in having a
discussion than giving a lecture …
08/31/18 DSI Lunch & Learn 2
Lets start with a couple of definitions…
08/31/18 DSI Lunch & Learn 3
What Do We Mean by Data Science?
• Use of the ever increasing amount of open,
complex, diverse digital data
• Finding ways to ask and then answer relevant
questions by combining such diverse data sets
• Arriving at statistically significant conclusions
not otherwise obtainable
• Sharing such findings in a useful way
• Translating such findings into actions that
improve the human condition
08/31/18 DSI Lunch & Learn 4
What Do We Mean by Structural
Biology?
08/31/18 DSI Lunch & Learn 5
Structure… What’s it good for??
Classic structural biology example
A point mutation (E6→V) in the Hb β globin chain results in sickle
cell anemia
Structural biology success stories
microtubule
Atomic-resolution studies of cellular-scale systems have bec-
ome increasingly possible — immense explanatory power!
mid-1990s
1960-70s
early1990s
~2002
1986
Why Do We Care About this
Intersection?
08/31/18 DSI Lunch & Learn 8
Stepping back…
Data are transforming how we think about
everything, including biomedical research…
Most folks just do not realize it yet…
Your reading of this slide relies on structural
biology (a photoreceptor called rhodopsin!)
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
From a presentation to the Advisory Board to the NIH Director
Example - Photography
908/31/18 DSI Lunch & Learn
How is the DSI Responding to this Change?
• Societal good
• Interdisciplinary
• Practical experience
• Ethical conduct
• Openness and transparency
08/31/18 DSI Lunch & Learn 10
Surge in publications involving machine
learning in the biosciences ('J-curve')
Example of Why More Openness:
Diffuse Intrinsic Pontine Gliomas (DIPG)
• Occur 1:100,000
individuals
• Peak incidence 6-8 years
of age
• Median survival 9-12
months
• Surgery is not an option
• Chemotherapy ineffective
and radiotherapy only
transitive
From Adam Resnick
08/31/18 DSI Lunch & Learn 11
Timeline of genomic studies in DIPG
• Landmark studies identify
histone mutations as
recurrent driver mutations in
DIPG ~2012
• Almost 3 years later, in
largely the same datasets,
but partially expanded, the
same two groups and 2
others identify ACVR1
mutations as a secondary, co-
occurring mutation
From Adam Resnick
08/31/18 DSI Lunch & Learn 12
What do we need to do differently to
reveal ACVR1?
• ACVR1 is a targetable kinase
• Inhibition of ACVR1 inhibited tumor
progression in vitro
• ~300 DIPG patients a year
• ~60 are predicted to have ACVR1
• If large scale data sets were only
integrated with TCGA and/or rare
disease data in 2012, ACVR1 mutations
would have been identified
• 60 patients/year X 3 years = 180
children’s lives (who likely succumbed to
the disease during that time) could have
been impacted if only data were FAIR
From Adam Resnick
08/31/18 DSI Lunch & Learn 13
08/31/18 DSI Lunch & Learn 14
Working across the Grounds
to break down traditional silos
• Sustainable
• Designing for where the academical village meets Google – an
ecosystem in which students, faculty, staff, visitors, private sector
reps, entrepreneurs live and work
• Open UVA and open data – Wikimedian in Residence
• Collaboration
– Dual degrees
– Research projects across disciplines
– Sister institutions
• MS DS focusing on practical training
• PhD program
• Undergraduate major
• Undergraduate certificate
08/31/18 DSI Lunch & Learn 15
Hallmarks
Reflecting Those
Principles
Under development
DSI Organization
Structural Biology is one of Many Cross Cutting Initiatives
08/31/18 DSI Lunch & Learn 16
Data Integration
& Engineering
Machine Learning
& Analytics
Visualization
& Dissemination
Data Acquisition Ethics, Law,
Policy,
Social Implications
Structural Biology
DSI Organization
Structural Biology is one of Many Cross Cutting Initiatives
08/31/18 DSI Lunch & Learn 17
Structural Biology mapped onto the five pillars of Data Science
Structural Biology
Lets Briefly Focus on those Five Points
of Intersection in the Context of
Structural Biology …
08/31/18 DSI Lunch & Learn 18
Data Acquisition
08/31/18 DSI Lunch & Learn 19
The data production issue (the V’s of Big Data)— Experimentally
• Estimated (2017) that ≈2.5 quintillion (2.5×1018) bytes of data generated daily, with 90%
of all the world’s data having been created in the past two years.
• Plaintext PDB files typically ≈ few 100s KB (…but, that’s just the start!)
Data Acquisition
08/31/18 DSI Lunch & Learn 20
The data production issue (the V’s of Big Data)— Computationally
• Here are some 2D RMSD matrices from a µs-scale biomolecular simulation.
• Half a mole (6.02×1023) of calculations!
Data Acquisition
08/31/18 DSI Lunch & Learn 21
The data reduction issue (the V’s of Big Data)— Computationally
• The produce/spawn/consume idiom (MapReduce)
Data Integration and
Engineering
• Data are structured
– Ontologies
– Object identifiers
– Indexing schemes
– Common data models
08/31/18 DSI Lunch & Learn 22
Machine Learning &
Analytics
08/31/18 DSI Lunch & Learn 23
• Structure->Function• Sequence->Structure
Protein•Protein
Protein•Ligand
Binding sites
Machine Learning &
Analytics
• Neural nets
• Deep learning
08/31/18 DSI Lunch & Learn 24
Machine Learning &
Analytics
08/31/18 DSI Lunch & Learn 25
• Deep Learning for Object Recognition/Segmentation
Features in
Image Slice
Predicted
Classes
Badrinarayanan, et al. 2016. arXiv:1511.00561v3
Machine Learning &
Analytics
08/31/18 DSI Lunch & Learn 26
• Deep Learning for Object Recognition/Segmentation
Features in
Volume Slice
Predicted
Classes
Badrinarayanan, et al. 2016. arXiv:1511.00561v3
Visualization
• VR
• Networks
• Sonics
08/31/18 DSI Lunch & Learn 27
Starting point: Structure of a bacterial
protein involved in RNA-associated
regulatory circuits (e.g., virulence)
Visualization
08/31/18 DSI Lunch & Learn 28
Visualization
• VR
• Networks
• Sonics
08/31/18 DSI Lunch & Learn 29
What about dynamics (life not at T=0)?
Visualization
• VR
• Networks
• Sonics
08/31/18
What about dynamics (life not at T=0)?
Visualization
08/31/18
What about physics (of RNA-binding)?
Visualization
08/31/18
What about statistics (log-odds here)?
Visualization
08/31/18
What about cellular-scale systems?
Kozlikova et al., 2016; Comp Graph Forum
Visualization
08/31/18
What about cellular-scale systems?
Kozlikova et al., 2016; Comp Graph Forum
Ethics, Law,
Policy & Social
Implications
•A Story of Fraud
08/31/18 DSI Lunch & Learn 35
Thank You
peb6a@virginia.edu
3608/31/18 DSI Lunch & Learn

Weitere ähnliche Inhalte

Was ist angesagt?

Social metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and QualitySocial metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and Quality
William Gunn
 
Academia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia BehindAcademia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia Behind
William Gunn
 
The Scholarly Publishing Roundtable: Recommendations for access to federally ...
The Scholarly Publishing Roundtable: Recommendations for access to federally ...The Scholarly Publishing Roundtable: Recommendations for access to federally ...
The Scholarly Publishing Roundtable: Recommendations for access to federally ...
T Scott Plutchak
 

Was ist angesagt? (20)

Health Policy and Management as it Relates to Big Data
Health Policy and Management as it Relates to Big DataHealth Policy and Management as it Relates to Big Data
Health Policy and Management as it Relates to Big Data
 
The NIH Commons: A Cloud-based Training Environment
The NIH Commons: A Cloud-based Training EnvironmentThe NIH Commons: A Cloud-based Training Environment
The NIH Commons: A Cloud-based Training Environment
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
Next Generation Preprint Service
Next Generation Preprint ServiceNext Generation Preprint Service
Next Generation Preprint Service
 
Big Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH PerspectiveBig Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH Perspective
 
Oregon State University Keynote
Oregon State University KeynoteOregon State University Keynote
Oregon State University Keynote
 
Library Data Management Services
Library Data Management ServicesLibrary Data Management Services
Library Data Management Services
 
Curating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesCurating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research Libraries
 
FSCI Data management and data sharing
FSCI Data management and data sharingFSCI Data management and data sharing
FSCI Data management and data sharing
 
Health and clinical research - data futures, NIHR accelerating digital programme
Health and clinical research - data futures, NIHR accelerating digital programmeHealth and clinical research - data futures, NIHR accelerating digital programme
Health and clinical research - data futures, NIHR accelerating digital programme
 
Act teacherlibrarians2016
Act teacherlibrarians2016Act teacherlibrarians2016
Act teacherlibrarians2016
 
How to own your research communications - The importance of identity and owne...
How to own your research communications - The importance of identity and owne...How to own your research communications - The importance of identity and owne...
How to own your research communications - The importance of identity and owne...
 
Summit on Olive Project software emulation and curation service
Summit on Olive Project software emulation and curation serviceSummit on Olive Project software emulation and curation service
Summit on Olive Project software emulation and curation service
 
Social metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and QualitySocial metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and Quality
 
Makers Go To College - Your Digital Future 2016
Makers Go To College - Your Digital Future 2016Makers Go To College - Your Digital Future 2016
Makers Go To College - Your Digital Future 2016
 
Big Data review
Big Data reviewBig Data review
Big Data review
 
Academia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia BehindAcademia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia Behind
 
Information is beautiful
Information is beautifulInformation is beautiful
Information is beautiful
 
The Scholarly Publishing Roundtable: Recommendations for access to federally ...
The Scholarly Publishing Roundtable: Recommendations for access to federally ...The Scholarly Publishing Roundtable: Recommendations for access to federally ...
The Scholarly Publishing Roundtable: Recommendations for access to federally ...
 
The role of academic libraries in supporting social sciences research
The role of academic libraries in supporting social sciences researchThe role of academic libraries in supporting social sciences research
The role of academic libraries in supporting social sciences research
 

Ähnlich wie Data Science Meets Structural Biology

Ähnlich wie Data Science Meets Structural Biology (20)

Institutional Data Management Blueprint
Institutional Data Management BlueprintInstitutional Data Management Blueprint
Institutional Data Management Blueprint
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
 
Digital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchDigital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening Research
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
2011.10.10 Multi-Disciplinary Research Themes and Training
2011.10.10 Multi-Disciplinary Research Themes and Training2011.10.10 Multi-Disciplinary Research Themes and Training
2011.10.10 Multi-Disciplinary Research Themes and Training
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
 
Chapter 12
Chapter 12Chapter 12
Chapter 12
 
Pre jisc datachampday_260318
Pre jisc datachampday_260318Pre jisc datachampday_260318
Pre jisc datachampday_260318
 
Is a Biological Database Really Different than a Biological Journal?
Is a Biological Database Really Different than a Biological Journal?Is a Biological Database Really Different than a Biological Journal?
Is a Biological Database Really Different than a Biological Journal?
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms:
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigms
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data Management and Broader Impacts: a holistic approach
Data Management and Broader Impacts: a holistic approachData Management and Broader Impacts: a holistic approach
Data Management and Broader Impacts: a holistic approach
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
 
Mind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeMind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and Practice
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
 

Mehr von Philip Bourne

Mehr von Philip Bourne (20)

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
 
Social Responsibility in Research
Social Responsibility in ResearchSocial Responsibility in Research
Social Responsibility in Research
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 

Kürzlich hochgeladen

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Kürzlich hochgeladen (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

Data Science Meets Structural Biology

  • 1. Data Science Meets Structural Biology Philip E. Bourne, Cam Mura & Eli Draizen (Open Team Science) https://www.slideshare.net/pebourne 08/31/18 DSI Lunch & Learn 1 https://arxiv.org/abs/1807.09247
  • 2. We are more interested in having a discussion than giving a lecture … 08/31/18 DSI Lunch & Learn 2
  • 3. Lets start with a couple of definitions… 08/31/18 DSI Lunch & Learn 3
  • 4. What Do We Mean by Data Science? • Use of the ever increasing amount of open, complex, diverse digital data • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 08/31/18 DSI Lunch & Learn 4
  • 5. What Do We Mean by Structural Biology? 08/31/18 DSI Lunch & Learn 5
  • 6. Structure… What’s it good for?? Classic structural biology example A point mutation (E6→V) in the Hb β globin chain results in sickle cell anemia
  • 7. Structural biology success stories microtubule Atomic-resolution studies of cellular-scale systems have bec- ome increasingly possible — immense explanatory power! mid-1990s 1960-70s early1990s ~2002 1986
  • 8. Why Do We Care About this Intersection? 08/31/18 DSI Lunch & Learn 8 Stepping back… Data are transforming how we think about everything, including biomedical research… Most folks just do not realize it yet… Your reading of this slide relies on structural biology (a photoreceptor called rhodopsin!)
  • 9. Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - Photography 908/31/18 DSI Lunch & Learn
  • 10. How is the DSI Responding to this Change? • Societal good • Interdisciplinary • Practical experience • Ethical conduct • Openness and transparency 08/31/18 DSI Lunch & Learn 10 Surge in publications involving machine learning in the biosciences ('J-curve')
  • 11. Example of Why More Openness: Diffuse Intrinsic Pontine Gliomas (DIPG) • Occur 1:100,000 individuals • Peak incidence 6-8 years of age • Median survival 9-12 months • Surgery is not an option • Chemotherapy ineffective and radiotherapy only transitive From Adam Resnick 08/31/18 DSI Lunch & Learn 11
  • 12. Timeline of genomic studies in DIPG • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co- occurring mutation From Adam Resnick 08/31/18 DSI Lunch & Learn 12
  • 13. What do we need to do differently to reveal ACVR1? • ACVR1 is a targetable kinase • Inhibition of ACVR1 inhibited tumor progression in vitro • ~300 DIPG patients a year • ~60 are predicted to have ACVR1 • If large scale data sets were only integrated with TCGA and/or rare disease data in 2012, ACVR1 mutations would have been identified • 60 patients/year X 3 years = 180 children’s lives (who likely succumbed to the disease during that time) could have been impacted if only data were FAIR From Adam Resnick 08/31/18 DSI Lunch & Learn 13
  • 14. 08/31/18 DSI Lunch & Learn 14 Working across the Grounds to break down traditional silos
  • 15. • Sustainable • Designing for where the academical village meets Google – an ecosystem in which students, faculty, staff, visitors, private sector reps, entrepreneurs live and work • Open UVA and open data – Wikimedian in Residence • Collaboration – Dual degrees – Research projects across disciplines – Sister institutions • MS DS focusing on practical training • PhD program • Undergraduate major • Undergraduate certificate 08/31/18 DSI Lunch & Learn 15 Hallmarks Reflecting Those Principles Under development
  • 16. DSI Organization Structural Biology is one of Many Cross Cutting Initiatives 08/31/18 DSI Lunch & Learn 16 Data Integration & Engineering Machine Learning & Analytics Visualization & Dissemination Data Acquisition Ethics, Law, Policy, Social Implications Structural Biology
  • 17. DSI Organization Structural Biology is one of Many Cross Cutting Initiatives 08/31/18 DSI Lunch & Learn 17 Structural Biology mapped onto the five pillars of Data Science Structural Biology
  • 18. Lets Briefly Focus on those Five Points of Intersection in the Context of Structural Biology … 08/31/18 DSI Lunch & Learn 18
  • 19. Data Acquisition 08/31/18 DSI Lunch & Learn 19 The data production issue (the V’s of Big Data)— Experimentally • Estimated (2017) that ≈2.5 quintillion (2.5×1018) bytes of data generated daily, with 90% of all the world’s data having been created in the past two years. • Plaintext PDB files typically ≈ few 100s KB (…but, that’s just the start!)
  • 20. Data Acquisition 08/31/18 DSI Lunch & Learn 20 The data production issue (the V’s of Big Data)— Computationally • Here are some 2D RMSD matrices from a µs-scale biomolecular simulation. • Half a mole (6.02×1023) of calculations!
  • 21. Data Acquisition 08/31/18 DSI Lunch & Learn 21 The data reduction issue (the V’s of Big Data)— Computationally • The produce/spawn/consume idiom (MapReduce)
  • 22. Data Integration and Engineering • Data are structured – Ontologies – Object identifiers – Indexing schemes – Common data models 08/31/18 DSI Lunch & Learn 22
  • 23. Machine Learning & Analytics 08/31/18 DSI Lunch & Learn 23 • Structure->Function• Sequence->Structure Protein•Protein Protein•Ligand Binding sites
  • 24. Machine Learning & Analytics • Neural nets • Deep learning 08/31/18 DSI Lunch & Learn 24
  • 25. Machine Learning & Analytics 08/31/18 DSI Lunch & Learn 25 • Deep Learning for Object Recognition/Segmentation Features in Image Slice Predicted Classes Badrinarayanan, et al. 2016. arXiv:1511.00561v3
  • 26. Machine Learning & Analytics 08/31/18 DSI Lunch & Learn 26 • Deep Learning for Object Recognition/Segmentation Features in Volume Slice Predicted Classes Badrinarayanan, et al. 2016. arXiv:1511.00561v3
  • 27. Visualization • VR • Networks • Sonics 08/31/18 DSI Lunch & Learn 27 Starting point: Structure of a bacterial protein involved in RNA-associated regulatory circuits (e.g., virulence)
  • 29. Visualization • VR • Networks • Sonics 08/31/18 DSI Lunch & Learn 29 What about dynamics (life not at T=0)?
  • 30. Visualization • VR • Networks • Sonics 08/31/18 What about dynamics (life not at T=0)?
  • 33. Visualization 08/31/18 What about cellular-scale systems? Kozlikova et al., 2016; Comp Graph Forum
  • 34. Visualization 08/31/18 What about cellular-scale systems? Kozlikova et al., 2016; Comp Graph Forum
  • 35. Ethics, Law, Policy & Social Implications •A Story of Fraud 08/31/18 DSI Lunch & Learn 35