SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
“Got a nail? I got a hammer!”
Lessons for Data science from the “dawn” of big science
Ben Keller
Data Science Dojo
14 January 2015
@vinegarbin

bjkeller.github.io

linkedin.com/in/bjkeller[ ]
Creative Commons Attribution-
ShareAlike 4.0 International License
][
Some context
Almost 10 years ago the NIH program “National
Centers for Biomedical Computing” started
Goal to answer questions of driving projects in
biomedicine using computing
“Big science” [though maybe not the “dawn” ]
A different perspective
Questions center around story-telling
“what molecular activity links this genetic
change to the symptoms of type 2 diabetes?”
Overriding goal to build software to find answers
Proof-of-concept analysis to drive development
"I got a hammer – you got a
nail?"
A computational scientist – trained in
algorithms – thinks of a problem as
...and will build tools that solve it
Given: a set of genes G, a covariance matrix M
over expression of genes in G
Find: a family of gene sets {Gi}, subsets of G,
such that...
But, biologists think of their problems in
different ways:


What do you see?
p1 p2 …
s1 42.211 9.3211 …
s2 2.192 8.9942 …
⋮ ⋮ ⋮ ⋱
We see what we recognize
the van carrying my
geology class stops
next to a rock feature
that looks like:
We see what we recognize
John, napping in the
back seat, wakes up
briefly and looks out
the window.
What did he see?
We see what we recognize
John will tell you he saw a chevron fold
formed by opposing pressure on the rock layers
We see what we recognize
Everyone else saw
that water flowing
along a crack had
formed a v-shaped
channel in steeply
sloping layers (Sorry, John. You’re still wrong.)
So, what do you see?
p1 p2 …
s1 42.211 9.3211 …
s2 2.192 8.9942 …
⋮ ⋮ ⋮ ⋱
"tabular data"
p1 p2 …
s1 42.211 9.3211 …
s2 2.192 8.9942 …
⋮ ⋮ ⋮ ⋱
You might see
columns on which to do regression
a matrix on which to do matrix factorization
a graph connecting subjects to features
variables on which to measure mutual information
variables on which to Bayesian inference
You might see
A proxy problem that you
already know how to solve
We solve what we see
Scientist had results
linking genomic regions
to each other in
subjects with bipolar
disorder
Asking: what is common?
We solve what we see
Asking: what is common?…
……
Previously used graph to
represent what was common in
recommender systems
We solve what we see
Asking: what is common?
…
……
Previously used graph to
represent what was common in
recommender systems
We get answers, but hard to
interpret biologically
CDKN2A/B
PPARG
HHEX TCF7L2
"mortality"
"g1""repression"
Cognitive engineering tells us that we have to
manage relationships of
how we
think of
problem
how
represented
by tools
Cognitive engineering tells us that we have to
manage relationships of
way we
think of
tasks
what is
allowed by
tools
Lesson:
see the data and problem as
the expert “owner” sees them
Lesson:
see the data and problem as
the expert “owner” sees them
Read as:
- leave the data as it is, and avoid exposing
abstractions not already in the original problem
- provide a expert-understandable explanation of
analysis, and, if you can’t, rethink whether the
approach is useful
"Need hot water for your bath?
Here’s a bucket, 

a pot and a stove. 

The well is outside!”
Stories we are looking
are complex
A
B
C
D
Stories we are looking
are complex
A
B
C
D
with interrelated data
Stories we are looking
are complex
with interrelated data
and interrelated
chains of analysis
Stories we are looking
are complex
with interrelated data
and interrelated
chains of analysis
Often have to translate data between tools,
and change perspective
Lesson
Any analysis is part of a larger question
Lesson
Any analysis is part of a larger question
Read as:
- reduce cognitive load of interpreting
between analysis steps
- understand how different steps relate
and try to help expert understand flow
of analysis
use different
modes of
reasoning
may switch
between them
at any time
Experts reason in complex ways
Corollary:
Appeal to cognitive science
Corollary:
Appeal to cognitive science
Read as:
- use studies already done to understand
how scientists/experts do their work
- work with cog science expert to develop
understanding of domain experts
"Oh, that's easy! 

Just use a hammer!"
A complex problem
- involves uncertainty
- draws on incomplete and diverse
sources of information
- may be affected by several factors
and be driven by competing
objectives
(Mirel, Interaction design for complex problem solving, 2004)
A systems biology problem:
A complex problem because
- uncertain what is an actual solution
- involves diverse, incomplete, and possibly
irrelevant information
- based on incomplete observations, affected
by technology/methodology
- conflicting objectives of predicting/
remediating/understanding disease
Lesson:
Analysis is a complex problem
Corollary:
Embrace the uncertainty
Corollary:
Embrace the uncertainty
Read as:
- expect not to know what expert needs,
and for them not to know what they need
- be agile: build analysis in conversation
with expert to push understanding
Corollary
Data will be “special”
Corollary
Data will be “special”
Read as:
- understand where your data is coming
from, what it represents, and how it the
data owner sees it
- understand sources of error/noise
Corollary
Objectives drive the question
Corollary
Objectives drive the question
Read as:
- understand objectives for analysis
- be clear to data owner which objectives
being met
Corollary
The question will change
Corollary
The question will change
Read as:
once analysis gives the answer, expert
may recognize it was the wrong question,
or may come up with another one
Ultimate Lesson: 

it's the people
Barbara Mirel (U.Michigan)
Thanks to
This work is licensed under a 

Creative Commons Attribution-ShareAlike 4.0
International License.

Weitere ähnliche Inhalte

Was ist angesagt?

Memory advances in Neural Turing Machines
Memory advances in Neural Turing MachinesMemory advances in Neural Turing Machines
Memory advances in Neural Turing MachinesDeakin University
 
Keynote 1: Teaching and Learning Computational Thinking at Scale
Keynote 1: Teaching and Learning Computational Thinking at ScaleKeynote 1: Teaching and Learning Computational Thinking at Scale
Keynote 1: Teaching and Learning Computational Thinking at ScaleCITE
 
Smart energy privacy tac tics2014
Smart energy privacy tac tics2014Smart energy privacy tac tics2014
Smart energy privacy tac tics2014Arpan Pal
 
Data science lecture4_doaa_mohey
Data science lecture4_doaa_moheyData science lecture4_doaa_mohey
Data science lecture4_doaa_moheyDoaa Mohey Eldin
 
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET Journal
 
Thin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the UnknownsThin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the UnknownsMichele Chubirka
 
Knowledgebase vs Database
Knowledgebase vs DatabaseKnowledgebase vs Database
Knowledgebase vs DatabaseCJ Jenkins
 
Development of Computer Aided Learning Software for Use in Electric Circuit A...
Development of Computer Aided Learning Software for Use in Electric Circuit A...Development of Computer Aided Learning Software for Use in Electric Circuit A...
Development of Computer Aided Learning Software for Use in Electric Circuit A...drboon
 
Patterns of Assigning Responsibilities
Patterns of Assigning ResponsibilitiesPatterns of Assigning Responsibilities
Patterns of Assigning Responsibilitiesguest2a92cd9
 
Bt9402, artificial intelligence
Bt9402, artificial intelligenceBt9402, artificial intelligence
Bt9402, artificial intelligencesmumbahelp
 

Was ist angesagt? (13)

Interview readiness
Interview readinessInterview readiness
Interview readiness
 
Memory advances in Neural Turing Machines
Memory advances in Neural Turing MachinesMemory advances in Neural Turing Machines
Memory advances in Neural Turing Machines
 
SoftComputing1
SoftComputing1SoftComputing1
SoftComputing1
 
QAI brochure
QAI brochureQAI brochure
QAI brochure
 
Keynote 1: Teaching and Learning Computational Thinking at Scale
Keynote 1: Teaching and Learning Computational Thinking at ScaleKeynote 1: Teaching and Learning Computational Thinking at Scale
Keynote 1: Teaching and Learning Computational Thinking at Scale
 
Smart energy privacy tac tics2014
Smart energy privacy tac tics2014Smart energy privacy tac tics2014
Smart energy privacy tac tics2014
 
Data science lecture4_doaa_mohey
Data science lecture4_doaa_moheyData science lecture4_doaa_mohey
Data science lecture4_doaa_mohey
 
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
 
Thin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the UnknownsThin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the Unknowns
 
Knowledgebase vs Database
Knowledgebase vs DatabaseKnowledgebase vs Database
Knowledgebase vs Database
 
Development of Computer Aided Learning Software for Use in Electric Circuit A...
Development of Computer Aided Learning Software for Use in Electric Circuit A...Development of Computer Aided Learning Software for Use in Electric Circuit A...
Development of Computer Aided Learning Software for Use in Electric Circuit A...
 
Patterns of Assigning Responsibilities
Patterns of Assigning ResponsibilitiesPatterns of Assigning Responsibilities
Patterns of Assigning Responsibilities
 
Bt9402, artificial intelligence
Bt9402, artificial intelligenceBt9402, artificial intelligence
Bt9402, artificial intelligence
 

Andere mochten auch

Visualizing biological graphs in Cytoscape.js
Visualizing biological graphs in Cytoscape.jsVisualizing biological graphs in Cytoscape.js
Visualizing biological graphs in Cytoscape.jsBenjamin Keller
 
Graph Annotation: A Tale of Two Binary Relations
Graph Annotation: A Tale of Two Binary RelationsGraph Annotation: A Tale of Two Binary Relations
Graph Annotation: A Tale of Two Binary RelationsBenjamin Keller
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in CytoscapeKeiichiro Ono
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Keiichiro Ono
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsKeiichiro Ono
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Keiichiro Ono
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 

Andere mochten auch (7)

Visualizing biological graphs in Cytoscape.js
Visualizing biological graphs in Cytoscape.jsVisualizing biological graphs in Cytoscape.js
Visualizing biological graphs in Cytoscape.js
 
Graph Annotation: A Tale of Two Binary Relations
Graph Annotation: A Tale of Two Binary RelationsGraph Annotation: A Tale of Two Binary Relations
Graph Annotation: A Tale of Two Binary Relations
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in Cytoscape
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis Tools
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 

Ähnlich wie "Got a nail? I got a hammer": Lessons for data science from the "dawn" of big science

PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptxPPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptxRaviKiranVarma4
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynotec.titus.brown
 
Research learning goal 2
Research learning goal 2Research learning goal 2
Research learning goal 2Tsmith9946
 
3 D Project Based Learning Basics for the New Generation Science Standards
3 D Project Based  Learning Basics for the New Generation Science Standards3 D Project Based  Learning Basics for the New Generation Science Standards
3 D Project Based Learning Basics for the New Generation Science Standardsrekharajaseran
 
Ch 1 research introduciton
Ch 1 research introducitonCh 1 research introduciton
Ch 1 research introducitonTemtim assefa
 
Behaviorism tocognitivism 13
Behaviorism tocognitivism 13Behaviorism tocognitivism 13
Behaviorism tocognitivism 13Amy Adcock
 
Kondas IPA IIIIIIIIIIIIIIIIIIIIIIII.pptx
Kondas IPA IIIIIIIIIIIIIIIIIIIIIIII.pptxKondas IPA IIIIIIIIIIIIIIIIIIIIIIII.pptx
Kondas IPA IIIIIIIIIIIIIIIIIIIIIIII.pptxHendrawan78
 
chapter-3.pptx
chapter-3.pptxchapter-3.pptx
chapter-3.pptxAsmaRauf5
 
Applying an intersectionality lens in data science
Applying an intersectionality lens in data scienceApplying an intersectionality lens in data science
Applying an intersectionality lens in data scienceData Con LA
 
Integrating Technology, Higher-Order Thinking, and Student-Centered Learning
Integrating Technology, Higher-Order Thinking, and Student-Centered LearningIntegrating Technology, Higher-Order Thinking, and Student-Centered Learning
Integrating Technology, Higher-Order Thinking, and Student-Centered LearningDoug Adams
 
Cell Structures and Functions Using SketchUp pg 2.pdf
Cell Structures and Functions Using SketchUp pg 2.pdfCell Structures and Functions Using SketchUp pg 2.pdf
Cell Structures and Functions Using SketchUp pg 2.pdfsanet10
 
Artificial Intelligence A Modern Approach
Artificial Intelligence A Modern ApproachArtificial Intelligence A Modern Approach
Artificial Intelligence A Modern ApproachSara Perez
 
Decision support systems
Decision support systemsDecision support systems
Decision support systemsMR Z
 
The Role Of Ontology In Modern Expert Systems Dallas 2008
The Role Of Ontology In Modern Expert Systems   Dallas   2008The Role Of Ontology In Modern Expert Systems   Dallas   2008
The Role Of Ontology In Modern Expert Systems Dallas 2008Jason Morris
 
8 princípios de arquitetura da informação
8 princípios de arquitetura da informação8 princípios de arquitetura da informação
8 princípios de arquitetura da informaçãoJonathan Prateat
 

Ähnlich wie "Got a nail? I got a hammer": Lessons for data science from the "dawn" of big science (20)

PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptxPPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptx
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
Research learning goal 2
Research learning goal 2Research learning goal 2
Research learning goal 2
 
3 D Project Based Learning Basics for the New Generation Science Standards
3 D Project Based  Learning Basics for the New Generation Science Standards3 D Project Based  Learning Basics for the New Generation Science Standards
3 D Project Based Learning Basics for the New Generation Science Standards
 
Ch 1 research introduciton
Ch 1 research introducitonCh 1 research introduciton
Ch 1 research introduciton
 
Behaviorism tocognitivism 13
Behaviorism tocognitivism 13Behaviorism tocognitivism 13
Behaviorism tocognitivism 13
 
Kondas IPA IIIIIIIIIIIIIIIIIIIIIIII.pptx
Kondas IPA IIIIIIIIIIIIIIIIIIIIIIII.pptxKondas IPA IIIIIIIIIIIIIIIIIIIIIIII.pptx
Kondas IPA IIIIIIIIIIIIIIIIIIIIIIII.pptx
 
chapter-3.pptx
chapter-3.pptxchapter-3.pptx
chapter-3.pptx
 
Discussant EARLI sig 27
Discussant EARLI sig 27Discussant EARLI sig 27
Discussant EARLI sig 27
 
Deep Learning 2.0
Deep Learning 2.0Deep Learning 2.0
Deep Learning 2.0
 
Applying an intersectionality lens in data science
Applying an intersectionality lens in data scienceApplying an intersectionality lens in data science
Applying an intersectionality lens in data science
 
Ev681 computing 1
Ev681   computing 1Ev681   computing 1
Ev681 computing 1
 
Integrating Technology, Higher-Order Thinking, and Student-Centered Learning
Integrating Technology, Higher-Order Thinking, and Student-Centered LearningIntegrating Technology, Higher-Order Thinking, and Student-Centered Learning
Integrating Technology, Higher-Order Thinking, and Student-Centered Learning
 
Cognitive Tools
Cognitive ToolsCognitive Tools
Cognitive Tools
 
Cell Structures and Functions Using SketchUp pg 2.pdf
Cell Structures and Functions Using SketchUp pg 2.pdfCell Structures and Functions Using SketchUp pg 2.pdf
Cell Structures and Functions Using SketchUp pg 2.pdf
 
Artificial Intelligence A Modern Approach
Artificial Intelligence A Modern ApproachArtificial Intelligence A Modern Approach
Artificial Intelligence A Modern Approach
 
Decision support systems
Decision support systemsDecision support systems
Decision support systems
 
Media as Levers
Media as LeversMedia as Levers
Media as Levers
 
The Role Of Ontology In Modern Expert Systems Dallas 2008
The Role Of Ontology In Modern Expert Systems   Dallas   2008The Role Of Ontology In Modern Expert Systems   Dallas   2008
The Role Of Ontology In Modern Expert Systems Dallas 2008
 
8 princípios de arquitetura da informação
8 princípios de arquitetura da informação8 princípios de arquitetura da informação
8 princípios de arquitetura da informação
 

Kürzlich hochgeladen

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 

Kürzlich hochgeladen (20)

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 

"Got a nail? I got a hammer": Lessons for data science from the "dawn" of big science

  • 1. “Got a nail? I got a hammer!” Lessons for Data science from the “dawn” of big science Ben Keller Data Science Dojo 14 January 2015 @vinegarbin
 bjkeller.github.io
 linkedin.com/in/bjkeller[ ] Creative Commons Attribution- ShareAlike 4.0 International License ][
  • 2. Some context Almost 10 years ago the NIH program “National Centers for Biomedical Computing” started Goal to answer questions of driving projects in biomedicine using computing “Big science” [though maybe not the “dawn” ]
  • 3. A different perspective Questions center around story-telling “what molecular activity links this genetic change to the symptoms of type 2 diabetes?” Overriding goal to build software to find answers Proof-of-concept analysis to drive development
  • 4. "I got a hammer – you got a nail?"
  • 5. A computational scientist – trained in algorithms – thinks of a problem as ...and will build tools that solve it Given: a set of genes G, a covariance matrix M over expression of genes in G Find: a family of gene sets {Gi}, subsets of G, such that...
  • 6. But, biologists think of their problems in different ways: 

  • 7. What do you see? p1 p2 … s1 42.211 9.3211 … s2 2.192 8.9942 … ⋮ ⋮ ⋮ ⋱
  • 8. We see what we recognize the van carrying my geology class stops next to a rock feature that looks like:
  • 9. We see what we recognize John, napping in the back seat, wakes up briefly and looks out the window. What did he see?
  • 10. We see what we recognize John will tell you he saw a chevron fold formed by opposing pressure on the rock layers
  • 11. We see what we recognize Everyone else saw that water flowing along a crack had formed a v-shaped channel in steeply sloping layers (Sorry, John. You’re still wrong.)
  • 12. So, what do you see? p1 p2 … s1 42.211 9.3211 … s2 2.192 8.9942 … ⋮ ⋮ ⋮ ⋱
  • 13. "tabular data" p1 p2 … s1 42.211 9.3211 … s2 2.192 8.9942 … ⋮ ⋮ ⋮ ⋱
  • 14. You might see columns on which to do regression a matrix on which to do matrix factorization a graph connecting subjects to features variables on which to measure mutual information variables on which to Bayesian inference
  • 15. You might see A proxy problem that you already know how to solve
  • 16. We solve what we see Scientist had results linking genomic regions to each other in subjects with bipolar disorder Asking: what is common?
  • 17. We solve what we see Asking: what is common?… …… Previously used graph to represent what was common in recommender systems
  • 18. We solve what we see Asking: what is common? … …… Previously used graph to represent what was common in recommender systems We get answers, but hard to interpret biologically CDKN2A/B PPARG HHEX TCF7L2 "mortality" "g1""repression"
  • 19. Cognitive engineering tells us that we have to manage relationships of how we think of problem how represented by tools
  • 20. Cognitive engineering tells us that we have to manage relationships of way we think of tasks what is allowed by tools
  • 21. Lesson: see the data and problem as the expert “owner” sees them
  • 22. Lesson: see the data and problem as the expert “owner” sees them Read as: - leave the data as it is, and avoid exposing abstractions not already in the original problem - provide a expert-understandable explanation of analysis, and, if you can’t, rethink whether the approach is useful
  • 23. "Need hot water for your bath? Here’s a bucket, 
 a pot and a stove. 
 The well is outside!”
  • 24. Stories we are looking are complex A B C D
  • 25. Stories we are looking are complex A B C D with interrelated data
  • 26. Stories we are looking are complex with interrelated data and interrelated chains of analysis
  • 27. Stories we are looking are complex with interrelated data and interrelated chains of analysis Often have to translate data between tools, and change perspective
  • 28. Lesson Any analysis is part of a larger question
  • 29. Lesson Any analysis is part of a larger question Read as: - reduce cognitive load of interpreting between analysis steps - understand how different steps relate and try to help expert understand flow of analysis
  • 30. use different modes of reasoning may switch between them at any time Experts reason in complex ways
  • 32. Corollary: Appeal to cognitive science Read as: - use studies already done to understand how scientists/experts do their work - work with cog science expert to develop understanding of domain experts
  • 33. "Oh, that's easy! 
 Just use a hammer!"
  • 34. A complex problem - involves uncertainty - draws on incomplete and diverse sources of information - may be affected by several factors and be driven by competing objectives (Mirel, Interaction design for complex problem solving, 2004)
  • 35. A systems biology problem:
  • 36. A complex problem because - uncertain what is an actual solution - involves diverse, incomplete, and possibly irrelevant information - based on incomplete observations, affected by technology/methodology - conflicting objectives of predicting/ remediating/understanding disease
  • 37. Lesson: Analysis is a complex problem
  • 39. Corollary: Embrace the uncertainty Read as: - expect not to know what expert needs, and for them not to know what they need - be agile: build analysis in conversation with expert to push understanding
  • 40. Corollary Data will be “special”
  • 41. Corollary Data will be “special” Read as: - understand where your data is coming from, what it represents, and how it the data owner sees it - understand sources of error/noise
  • 43. Corollary Objectives drive the question Read as: - understand objectives for analysis - be clear to data owner which objectives being met
  • 45. Corollary The question will change Read as: once analysis gives the answer, expert may recognize it was the wrong question, or may come up with another one
  • 48. This work is licensed under a 
 Creative Commons Attribution-ShareAlike 4.0 International License.