SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Automated HypothesisTesting with
Large Scale Scientific Workflows
Yolanda Gil
Daniel Garijo
Rajiv Mayani
Varun Ratnakar
Information Sciences Institute
& Department of Computer Science
University of Southern California
http://www.isi.edu
Parag Mallick
Ravali Adusumilli
Hunter Boyce
Stanford School of Medicine
Canary Center for Early Cancer Detection
Stanford University
http://mallicklab.stanford.edu
http://www.disk-project.org
Talk Outline
๏ Motivation
๏ Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Scientific Data AnalysisToday:
Inefficient, Incomplete, Irreproducible
๏ Data analysis is time consuming
๏ Not systematic
๏ Not updated when new data/methods
become available
๏ Hard/impractical to reproduce prior
work
๏ Overall process is manually done:
inefficient and error-prone
๏ Analytic knowledge is
compartmentalised
New
hypothesis
Formulate
line of inquiry
(data + method)
Retrieve
data
Run
workflows
(methods)
Meta-analysis
of results
Our Focus: Cancer Multi-Omics
๏ Data Availability and Complexity:
• The multi-omic domain is filled with multiple levels of heterogeneous data that is
regularly expanding in volume and complexity through projects likeThe Cancer
Genome AtlasTCGA and and the associated Clinical ProteomicTumor Analysis
Consortium (CPTAC)
Our Focus: Cancer Multi-Omics
๏ Analytic Complexity:
• Multi-omic analysis requires the
use of dozens of interconnected
tools each of which may require
substantial domain knowledge. MAQ	
BWA	
BWA-SW	(SE	
only)		
PERM	
SOAPv2	
MOSAIK	
NOVOALIGN	
SAMTOOLS	
PICARD	
GATK	
PICARD	
SAMTOOLS	
IGVtools	
Domain Knowledge is isolated
Our Focus: Cancer Multi-Omics
๏ Multiple types and complexities
of hypotheses:
• Hypotheses span the range from
single-gene/single dataset to
multi-gene/multi-ome/multi-
dataset
• Is this protein is found in this sample ?
• Is this gene is found in this sample ?
• Is this protein is associated with a
certain cancer ?
• Which proteins are associated with a
certain cancer ?
• ..
• ..
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Our Approach: Hypotheses-Driven Discovery
๏ Represent scientist
hypotheses
๏ Formulate lines of inquiry
that express how a type of
hypothesis can be pursued by
data analysis workflows
๏ Design a meta-analysis that
examines the results of lines of
inquiry and either validates or
revises the original hypotheses
๏ Develop an intelligent agent
that can report and explain
new findings to the scientist
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Representing Hypotheses
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Representing Hypotheses
Requirements from Omics
๏ Graph-based hypothesis
representation
• Entities are nodes
• Relationships are links
๏ Annotations on graphs
• Represent qualifications of hypotheses:
confidence and evidence
๏ Representing hypothesis evolution
• Graph versioning
Graph representation in RDF
๏ Standard semantic web language
๏ Scalable reasoners available
๏ Qualifications and provenance
through triple reification
๏ Versioning through multiple
named graphs
Representing Hypotheses
Representing Hypotheses
Biology
ontology
Hypothesis
ontology
hyp:expressedIn
user:TCGA-AA-3561-01A-22
User data
definitions
hyp:associatedWith
bio:ColonCancer
Graph Hy1
Graph Hy2
bio:PRKCDBP
bio:PRKCDBP
Lifecycle of a hypothesis
Biology
ontology
Hypothesis
ontology
hyp:expressedIn
user:TCGA-AA-3561-01A-22
User data
definitions
hyp:associatedWith
bio:ColonCancer
Graph Hy1
Graph Hy2
bio:PRKCDBP
bio:PRKCDBP
1. Initial Hypothesis, Data & Workflows
Data Available
Workflows Available
Proteomics
Proteogenomics
XX_3561Proteome_VU.zip
(MassSpecData)
producedData TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX2
(Experiment)
experimentedOn
Hypothesis Statement Hy1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
2. Running workflows on Data
Data Available
Workflows Available
Proteomics
Proteogenomics
XX_3561Proteome_VU.zip
(MassSpecData)
producedData TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX2
(Experiment)
experimentedOn
Workflow Execution
W1
hasWorkflowTemplate
used
Hypothesis Statement Hy1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Qualifications of Hy1'Provenance of Hy1'
Hypothesis Statement Hy1
3. Meta reasoning about workflow results
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Data Available
Workflows Available
Proteomics
Proteogenomics
XX_3561Proteome_VU.zip
(MassSpecData)
producedData TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX2
(Experiment)
experimentedOn
Workflow Execution
W1
hasWorkflowTemplate
used
Meta-Workflow Execution
MW1
used
Revised Hypothesis Statement Hy1'
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
hasConfidenceValue
0
Statement Hy1'-S1
hasProvenance
producedused
produced
revisionOf
4. New Data becomes available
Workflows Available
Proteomics
Proteogenomics
Hypothesis Statement Ha1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Data Available
XX_3561Proteome_VU.zip
(MassSpecData)
producedData
producedData
experimentedOn
experimentedOn
TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX1
(Experiment)
AA_3561_EX2
(Experiment)
XX_3561_DD.zip
(RNASeqData)
5. New Multi-Workflows are also run
Workflows Available
Proteomics
Proteogenomics
used
Data Available
XX_3561Proteome_VU.zip
(MassSpecData)
producedData
producedData
experimentedOn
experimentedOn
TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX1
(Experiment)
AA_3561_EX2
(Experiment)
Workflow Execution
W2
XX_3561_DD.zip
(RNASeqData)
Workflow Execution
W1
used
Hypothesis Statement Ha1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Qualifications of Ha1'
hasProvenance
Provenance of Ha1'
6. Hypothesis Revision
Workflows Available
Proteomics
Proteogenomics
used
used
Revised Hypothesis Statement Ha1'
PRKCDBP
Mutated
expressedIn
TCGA-AA-3561-01A-22
hasConfidenceValue
0.98
Statement Ha1'-S1
producedused
Data Available
XX_3561Proteome_VU.zip
(MassSpecData)
producedData
producedData
experimentedOn
experimentedOn
TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX1
(Experiment)
AA_3561_EX2
(Experiment)
Workflow Execution
W2
XX_3561_DD.zip
(RNASeqData)
Workflow Execution
W1
used used
produced
Meta-Workflow Execution
MW2
Hypothesis Statement Ha1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
revisionOf
Representing Lines of Inquiry & Data analysis workflows
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Data Query Pattern
DataFile ?d
Hypothesis Pattern
Lines of Inquiry
๏ Capture how to setup potential analyses that can be pursued to test a certain type of
hypothesis
bio:Protein ?p
hyp:expressedIn
bio:Sample ?s
producedData
Patient ?pcollectedFromSample ?sExperiment ?e
experimentedOn
Data Analytic Workflows
ProteomicsProteogenomics
DataFile ?d
Meta-workflowsComparisonConfidence estimation Benchmarking
Example Multi-omics Workflow (Zhang et. al replication)
Automated Workflow Generation in WINGS by Reasoning about
Semantic Constraints
Example: all input data must be from human species, i.e. must have HS in metadata
Workflow system uses this constraint to select datasets that have HS in their metadata so they are valid
Representing Hypotheses
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Meta-workflows:
1) Comparison Meta-Workflows
Variant
Detection
Custom
Protein DB
Protein
Identification
Protein
Identification
Custom DB Reference DB
Protein IDs Protein IDs
Similarity
ScoreData Dependent:
•  Peptide Level
•  Protein Level
•  Scan Level
Comparison
Meta-Workflow
๏ Goals:
• Compare results amongst multiple workflows
• Measure the global similarity amongst multiple workflows
• Provide users with explanation of workflow-dependent
differences in results
Meta-workflows:
2) Benchmark Meta-Workflows
๏ Goals:
• Evaluation of workflow performance
• Training of confidence estimation models (probabilistic)
Probabilistic Models
Benchmark
Meta-Workflow
ROC, True/False
Positive Rate
Meta-workflows:
3) Confidence estimation Meta-Workflows
๏ Goals:
• Composite results from multiple workflows
• Estimate confidence of the workflow result
• Use estimated confidence to update hypothesis
Protein
Identification
Protein
Identification
Custom DB Reference DB
Protein IDs Protein IDs
Probabilistic
Model
Estimate Confidence
Update Hypothesis
Benchmark
Meta-Workflow
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
DISK Walkthrough: Initial Hypothesis
๏ Initial hypothesis is provided by the user
• PRKCDBP protein is expressed in a patient sample
DISK Walkthrough: Lines of Inquiry
๏ Line of inquiry suggests to find data from different experiments done with the
patient’s sample, then run multi-omic workflows, and then combine evidence into
confidence score
General hypothesis pattern
Data query pattern: search for different experiments
that produced omics data (eg type RNASeq and
MassSpecData)
Data analysis workflows to run on genomics and
proteomics data (more omics in the future)
Meta-workflows to assess confidence on the
hypothesis based on workflow results
DISK Walkthrough: Data & Workflows
To test a hypothesis that a protein is present in a patient’s sample:
๏ Retrieve mass spec and RNASeq data
๏ Use workflows
• Wf1: Proteome only
• Wf2: ProteoGenomic
DISK Walkthrough: Meta-Workflows
๏ After running the workflows, meta-
workflow analyse the results and generate a
confidence value
DISK Walkthrough: Revised Hypothesis
๏ The hypothesis is revised and given a confidence value:
• A mutation of the protein PRKCDBP has been expressed in the patient’s sample
TCGA-AA-3561-01A-22 with a confidence 0.9887
DISK Walkthrough: Provenance Details
๏ Hypothesis provenance stores information about workflows run and the data used
• Workflow execution provenance is published by WINGS in the prov standard.
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
DISK:Automated DIscovery of Scientific Knowledge
Workflow
Constraints
Workflow
Reasoning
Open
Publication of
Results as
Linked Data
Workflow
Provenance
WINGS Intelligent Workflow System
Lines of Inquiry
Interactive
Discovery
Agent
Hypothesis EvaluationHypotheses
Revised
hypotheses
& interesting
findings
Analytic Workflows
Data Retrieval
Workflow
Binding
Meta-Workflows
Confidence
Estimation
Benchmarking
Formulate
Lines of
Inquiry
Meta-Analysis
of Results
Data
Repository
Our Initial Focus: Reproduce Seminal Omics Analysis
[Zhang et al 2014]
๏ Replicated [Zhang et al 2014] Proteogenomic analysis of Colo-rectal cancer
๏ Successfully reproduced paper findings comparing results at multiple levels (final figure,
supplementary tables, etc.)
๏ Took months and direct conversations with authors to replicate paper figures and
supplemental figures
๏ Application of analysis approach to new cancer type now takes minutes
• Useful whenTCGA is integrated
๏ Expanded analysis to
• compare how sensitive findings were to workflow details
0
2
4
6
−1.0 −0.5 0.0 0.5 1.0
spearman correlation
density
Correlation between mRNA−protein abundance
(within samples)
0
1
2
−4 −3 −2 −1 0
spearman correlation
density
Correlation between mRNA−protein variation
(across samples)
Impact on Cancer Multi-Omics
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Related Work
1) Discovery Systems
๏ [Lenat 1976]
๏ [Lindsay et al 1980]
๏ [Langley 1981]
๏ [Falkenhainer 1985]
๏ [Kulkarni and Simon 1988]
๏ [Cheeseman et al 1989]
๏ [Zytkow et al 1990]
๏ [Simon 1996]
๏ [Valdes-Perez 1997]
๏ [Todorovski et al 2000]
๏ [Schmidt and Lipson 2009]
Related Work:
2) Hypothesis Representation as Graphs
๏ Existing vocabularies are related but need to be extended to represent hypotheses in
DISK
• SWAN [Gao et al 2006]
• EXPO [Soldatova and King 2006]
• Nanopublications [Groth et al 2010]
• Ovopublications [Callahan and Dumontier 2013]
• Micropublications [Clark et al 2014]
• LSC
• BEL
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Contributions
๏ Represent scientist hypotheses
• Hypothesis ontology includes revisions & provenance
๏ Formulate lines of inquiry that express how a type of hypothesis can be
pursued with a data analysis workflow
• Lines of inquiry outline what type of data and workflows to use, and customize
them to the hypotheses at hand
๏ Design a meta-analysis to assess the results of lines of inquiry and revise the
original hypotheses
• Meta-analysis workflows assess diverse evidence
Ongoing & Future Work
๏ Ongoing work:
• Interactive Discovery Agent that explains interesting findings
• Continuous analysis of data (TCGA/CPTAC) as it grows
• Extending and generalizing meta-workflows
• Using DISK in geosciences: Subsurface water resource modeling
๏ Future challenges:
• More complex hypotheses about several entities
• Incorporate evidence over time
• Designing domain-independent meta-workflows
• Resource-bound hypothesis exploration
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisEva Durall
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
Breakdown of Regression Models for Dissertations
Breakdown of Regression Models for DissertationsBreakdown of Regression Models for Dissertations
Breakdown of Regression Models for DissertationsStatistics Solutions
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020Eero Siljander
 
Applying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionApplying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionAndrea Miller-Nesbitt
 
Chapter 15 Social Research
Chapter 15 Social ResearchChapter 15 Social Research
Chapter 15 Social Researcharpsychology
 
Exploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryExploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryHamed Taherdoost
 
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSTECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSHamed Taherdoost
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationSeth Grimes
 
Qualitative data analysis pdf
Qualitative data analysis pdfQualitative data analysis pdf
Qualitative data analysis pdfAyuni Abdullah
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalDavidMaxwell77
 
Dealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysisDealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysisAileen Buckley
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and boltsNBER
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisVishwas N
 

Was ist angesagt? (19)

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Breakdown of Regression Models for Dissertations
Breakdown of Regression Models for DissertationsBreakdown of Regression Models for Dissertations
Breakdown of Regression Models for Dissertations
 
Data analysis
Data analysisData analysis
Data analysis
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020
 
Applying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionApplying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extraction
 
Chapter 15 Social Research
Chapter 15 Social ResearchChapter 15 Social Research
Chapter 15 Social Research
 
Exploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryExploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and Theory
 
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSTECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
 
Data analysis aug-11
Data analysis aug-11Data analysis aug-11
Data analysis aug-11
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
 
Qualitative data analysis pdf
Qualitative data analysis pdfQualitative data analysis pdf
Qualitative data analysis pdf
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information Retrieval
 
Dealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysisDealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysis
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 

Ähnlich wie Automated Hypothesis Testing with Large Scale Scientific Workflows

GBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfGBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfStanleyChivandire1
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...alessio_ferrari
 
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceStatistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceJohn McDonald
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineeringalessio_ferrari
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedSri Ambati
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networkingStenio Fernandes
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"James Neill
 
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...Rasha
 
Quantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsQuantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsMartin Kretzer
 
ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013Laura Pasquini
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetCongChen35
 
DIRECTIONSChoosing a Method· Discuss the similarities and dif.docx
DIRECTIONSChoosing a Method· Discuss the similarities and dif.docxDIRECTIONSChoosing a Method· Discuss the similarities and dif.docx
DIRECTIONSChoosing a Method· Discuss the similarities and dif.docxlynettearnold46882
 
grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 adrianheilbut
 
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Paolo Missier
 
Panda Provenance
Panda ProvenancePanda Provenance
Panda ProvenanceVlad Vega
 
Rearch methodology
Rearch methodologyRearch methodology
Rearch methodologyYedu Dharan
 
Quantitative Method
Quantitative MethodQuantitative Method
Quantitative Methodzahraa Aamir
 
Quantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptxQuantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptxKainatJameel
 

Ähnlich wie Automated Hypothesis Testing with Large Scale Scientific Workflows (20)

GBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfGBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdf
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
 
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceStatistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineering
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"
 
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
 
Research methodology (2)
Research methodology (2)Research methodology (2)
Research methodology (2)
 
Quantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsQuantitative Research: Surveys and Experiments
Quantitative Research: Surveys and Experiments
 
ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
 
DIRECTIONSChoosing a Method· Discuss the similarities and dif.docx
DIRECTIONSChoosing a Method· Discuss the similarities and dif.docxDIRECTIONSChoosing a Method· Discuss the similarities and dif.docx
DIRECTIONSChoosing a Method· Discuss the similarities and dif.docx
 
grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013
 
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
 
Panda Provenance
Panda ProvenancePanda Provenance
Panda Provenance
 
Rearch methodology
Rearch methodologyRearch methodology
Rearch methodology
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Quantitative Method
Quantitative MethodQuantitative Method
Quantitative Method
 
Quantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptxQuantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptx
 

Mehr von dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineeringdgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 

Mehr von dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 

Kürzlich hochgeladen

KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...M56BOOKSTORE PRODUCT/SERVICE
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfMohonDas
 
Work Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashaWork Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashasashalaycock03
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxPurva Nikam
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfMohonDas
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlDr. Bruce A. Johnson
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceApostolos Syropoulos
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeCeline George
 

Kürzlich hochgeladen (20)

KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdf
 
Work Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashaWork Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sasha
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptx
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting Bl
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial Intelligence
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using Code
 

Automated Hypothesis Testing with Large Scale Scientific Workflows

  • 1. Automated HypothesisTesting with Large Scale Scientific Workflows Yolanda Gil Daniel Garijo Rajiv Mayani Varun Ratnakar Information Sciences Institute & Department of Computer Science University of Southern California http://www.isi.edu Parag Mallick Ravali Adusumilli Hunter Boyce Stanford School of Medicine Canary Center for Early Cancer Detection Stanford University http://mallicklab.stanford.edu http://www.disk-project.org
  • 2. Talk Outline ๏ Motivation ๏ Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 3. Scientific Data AnalysisToday: Inefficient, Incomplete, Irreproducible ๏ Data analysis is time consuming ๏ Not systematic ๏ Not updated when new data/methods become available ๏ Hard/impractical to reproduce prior work ๏ Overall process is manually done: inefficient and error-prone ๏ Analytic knowledge is compartmentalised New hypothesis Formulate line of inquiry (data + method) Retrieve data Run workflows (methods) Meta-analysis of results
  • 4. Our Focus: Cancer Multi-Omics ๏ Data Availability and Complexity: • The multi-omic domain is filled with multiple levels of heterogeneous data that is regularly expanding in volume and complexity through projects likeThe Cancer Genome AtlasTCGA and and the associated Clinical ProteomicTumor Analysis Consortium (CPTAC)
  • 5. Our Focus: Cancer Multi-Omics ๏ Analytic Complexity: • Multi-omic analysis requires the use of dozens of interconnected tools each of which may require substantial domain knowledge. MAQ BWA BWA-SW (SE only) PERM SOAPv2 MOSAIK NOVOALIGN SAMTOOLS PICARD GATK PICARD SAMTOOLS IGVtools Domain Knowledge is isolated
  • 6. Our Focus: Cancer Multi-Omics ๏ Multiple types and complexities of hypotheses: • Hypotheses span the range from single-gene/single dataset to multi-gene/multi-ome/multi- dataset • Is this protein is found in this sample ? • Is this gene is found in this sample ? • Is this protein is associated with a certain cancer ? • Which proteins are associated with a certain cancer ? • .. • ..
  • 7. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 8. Our Approach: Hypotheses-Driven Discovery ๏ Represent scientist hypotheses ๏ Formulate lines of inquiry that express how a type of hypothesis can be pursued by data analysis workflows ๏ Design a meta-analysis that examines the results of lines of inquiry and either validates or revises the original hypotheses ๏ Develop an intelligent agent that can report and explain new findings to the scientist Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings
  • 9. Representing Hypotheses Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings Representing Hypotheses
  • 10. Requirements from Omics ๏ Graph-based hypothesis representation • Entities are nodes • Relationships are links ๏ Annotations on graphs • Represent qualifications of hypotheses: confidence and evidence ๏ Representing hypothesis evolution • Graph versioning Graph representation in RDF ๏ Standard semantic web language ๏ Scalable reasoners available ๏ Qualifications and provenance through triple reification ๏ Versioning through multiple named graphs Representing Hypotheses
  • 12. Lifecycle of a hypothesis Biology ontology Hypothesis ontology hyp:expressedIn user:TCGA-AA-3561-01A-22 User data definitions hyp:associatedWith bio:ColonCancer Graph Hy1 Graph Hy2 bio:PRKCDBP bio:PRKCDBP
  • 13. 1. Initial Hypothesis, Data & Workflows Data Available Workflows Available Proteomics Proteogenomics XX_3561Proteome_VU.zip (MassSpecData) producedData TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX2 (Experiment) experimentedOn Hypothesis Statement Hy1 PRKCDBP expressedIn TCGA-AA-3561-01A-22
  • 14. 2. Running workflows on Data Data Available Workflows Available Proteomics Proteogenomics XX_3561Proteome_VU.zip (MassSpecData) producedData TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX2 (Experiment) experimentedOn Workflow Execution W1 hasWorkflowTemplate used Hypothesis Statement Hy1 PRKCDBP expressedIn TCGA-AA-3561-01A-22
  • 15. Qualifications of Hy1'Provenance of Hy1' Hypothesis Statement Hy1 3. Meta reasoning about workflow results PRKCDBP expressedIn TCGA-AA-3561-01A-22 Data Available Workflows Available Proteomics Proteogenomics XX_3561Proteome_VU.zip (MassSpecData) producedData TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX2 (Experiment) experimentedOn Workflow Execution W1 hasWorkflowTemplate used Meta-Workflow Execution MW1 used Revised Hypothesis Statement Hy1' PRKCDBP expressedIn TCGA-AA-3561-01A-22 hasConfidenceValue 0 Statement Hy1'-S1 hasProvenance producedused produced revisionOf
  • 16. 4. New Data becomes available Workflows Available Proteomics Proteogenomics Hypothesis Statement Ha1 PRKCDBP expressedIn TCGA-AA-3561-01A-22 Data Available XX_3561Proteome_VU.zip (MassSpecData) producedData producedData experimentedOn experimentedOn TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX1 (Experiment) AA_3561_EX2 (Experiment) XX_3561_DD.zip (RNASeqData)
  • 17. 5. New Multi-Workflows are also run Workflows Available Proteomics Proteogenomics used Data Available XX_3561Proteome_VU.zip (MassSpecData) producedData producedData experimentedOn experimentedOn TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX1 (Experiment) AA_3561_EX2 (Experiment) Workflow Execution W2 XX_3561_DD.zip (RNASeqData) Workflow Execution W1 used Hypothesis Statement Ha1 PRKCDBP expressedIn TCGA-AA-3561-01A-22
  • 18. Qualifications of Ha1' hasProvenance Provenance of Ha1' 6. Hypothesis Revision Workflows Available Proteomics Proteogenomics used used Revised Hypothesis Statement Ha1' PRKCDBP Mutated expressedIn TCGA-AA-3561-01A-22 hasConfidenceValue 0.98 Statement Ha1'-S1 producedused Data Available XX_3561Proteome_VU.zip (MassSpecData) producedData producedData experimentedOn experimentedOn TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX1 (Experiment) AA_3561_EX2 (Experiment) Workflow Execution W2 XX_3561_DD.zip (RNASeqData) Workflow Execution W1 used used produced Meta-Workflow Execution MW2 Hypothesis Statement Ha1 PRKCDBP expressedIn TCGA-AA-3561-01A-22 revisionOf
  • 19. Representing Lines of Inquiry & Data analysis workflows Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings
  • 20. Data Query Pattern DataFile ?d Hypothesis Pattern Lines of Inquiry ๏ Capture how to setup potential analyses that can be pursued to test a certain type of hypothesis bio:Protein ?p hyp:expressedIn bio:Sample ?s producedData Patient ?pcollectedFromSample ?sExperiment ?e experimentedOn Data Analytic Workflows ProteomicsProteogenomics DataFile ?d Meta-workflowsComparisonConfidence estimation Benchmarking
  • 21. Example Multi-omics Workflow (Zhang et. al replication)
  • 22. Automated Workflow Generation in WINGS by Reasoning about Semantic Constraints Example: all input data must be from human species, i.e. must have HS in metadata Workflow system uses this constraint to select datasets that have HS in their metadata so they are valid
  • 23. Representing Hypotheses Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings
  • 24. Meta-workflows: 1) Comparison Meta-Workflows Variant Detection Custom Protein DB Protein Identification Protein Identification Custom DB Reference DB Protein IDs Protein IDs Similarity ScoreData Dependent: •  Peptide Level •  Protein Level •  Scan Level Comparison Meta-Workflow ๏ Goals: • Compare results amongst multiple workflows • Measure the global similarity amongst multiple workflows • Provide users with explanation of workflow-dependent differences in results
  • 25. Meta-workflows: 2) Benchmark Meta-Workflows ๏ Goals: • Evaluation of workflow performance • Training of confidence estimation models (probabilistic) Probabilistic Models Benchmark Meta-Workflow ROC, True/False Positive Rate
  • 26. Meta-workflows: 3) Confidence estimation Meta-Workflows ๏ Goals: • Composite results from multiple workflows • Estimate confidence of the workflow result • Use estimated confidence to update hypothesis Protein Identification Protein Identification Custom DB Reference DB Protein IDs Protein IDs Probabilistic Model Estimate Confidence Update Hypothesis Benchmark Meta-Workflow
  • 27. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 28. DISK Walkthrough: Initial Hypothesis ๏ Initial hypothesis is provided by the user • PRKCDBP protein is expressed in a patient sample
  • 29. DISK Walkthrough: Lines of Inquiry ๏ Line of inquiry suggests to find data from different experiments done with the patient’s sample, then run multi-omic workflows, and then combine evidence into confidence score General hypothesis pattern Data query pattern: search for different experiments that produced omics data (eg type RNASeq and MassSpecData) Data analysis workflows to run on genomics and proteomics data (more omics in the future) Meta-workflows to assess confidence on the hypothesis based on workflow results
  • 30. DISK Walkthrough: Data & Workflows To test a hypothesis that a protein is present in a patient’s sample: ๏ Retrieve mass spec and RNASeq data ๏ Use workflows • Wf1: Proteome only • Wf2: ProteoGenomic
  • 31. DISK Walkthrough: Meta-Workflows ๏ After running the workflows, meta- workflow analyse the results and generate a confidence value
  • 32. DISK Walkthrough: Revised Hypothesis ๏ The hypothesis is revised and given a confidence value: • A mutation of the protein PRKCDBP has been expressed in the patient’s sample TCGA-AA-3561-01A-22 with a confidence 0.9887
  • 33. DISK Walkthrough: Provenance Details ๏ Hypothesis provenance stores information about workflows run and the data used • Workflow execution provenance is published by WINGS in the prov standard.
  • 34. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 35. DISK:Automated DIscovery of Scientific Knowledge Workflow Constraints Workflow Reasoning Open Publication of Results as Linked Data Workflow Provenance WINGS Intelligent Workflow System Lines of Inquiry Interactive Discovery Agent Hypothesis EvaluationHypotheses Revised hypotheses & interesting findings Analytic Workflows Data Retrieval Workflow Binding Meta-Workflows Confidence Estimation Benchmarking Formulate Lines of Inquiry Meta-Analysis of Results Data Repository
  • 36. Our Initial Focus: Reproduce Seminal Omics Analysis [Zhang et al 2014]
  • 37. ๏ Replicated [Zhang et al 2014] Proteogenomic analysis of Colo-rectal cancer ๏ Successfully reproduced paper findings comparing results at multiple levels (final figure, supplementary tables, etc.) ๏ Took months and direct conversations with authors to replicate paper figures and supplemental figures ๏ Application of analysis approach to new cancer type now takes minutes • Useful whenTCGA is integrated ๏ Expanded analysis to • compare how sensitive findings were to workflow details 0 2 4 6 −1.0 −0.5 0.0 0.5 1.0 spearman correlation density Correlation between mRNA−protein abundance (within samples) 0 1 2 −4 −3 −2 −1 0 spearman correlation density Correlation between mRNA−protein variation (across samples) Impact on Cancer Multi-Omics
  • 38. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 39. Related Work 1) Discovery Systems ๏ [Lenat 1976] ๏ [Lindsay et al 1980] ๏ [Langley 1981] ๏ [Falkenhainer 1985] ๏ [Kulkarni and Simon 1988] ๏ [Cheeseman et al 1989] ๏ [Zytkow et al 1990] ๏ [Simon 1996] ๏ [Valdes-Perez 1997] ๏ [Todorovski et al 2000] ๏ [Schmidt and Lipson 2009]
  • 40. Related Work: 2) Hypothesis Representation as Graphs ๏ Existing vocabularies are related but need to be extended to represent hypotheses in DISK • SWAN [Gao et al 2006] • EXPO [Soldatova and King 2006] • Nanopublications [Groth et al 2010] • Ovopublications [Callahan and Dumontier 2013] • Micropublications [Clark et al 2014] • LSC • BEL
  • 41. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 42. Contributions ๏ Represent scientist hypotheses • Hypothesis ontology includes revisions & provenance ๏ Formulate lines of inquiry that express how a type of hypothesis can be pursued with a data analysis workflow • Lines of inquiry outline what type of data and workflows to use, and customize them to the hypotheses at hand ๏ Design a meta-analysis to assess the results of lines of inquiry and revise the original hypotheses • Meta-analysis workflows assess diverse evidence
  • 43. Ongoing & Future Work ๏ Ongoing work: • Interactive Discovery Agent that explains interesting findings • Continuous analysis of data (TCGA/CPTAC) as it grows • Extending and generalizing meta-workflows • Using DISK in geosciences: Subsurface water resource modeling ๏ Future challenges: • More complex hypotheses about several entities • Incorporate evidence over time • Designing domain-independent meta-workflows • Resource-bound hypothesis exploration