SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
Good practices (and challenges) for reproducibility
“Give your samples a decent life”
Javier Quilez
Outline
● Make groups of 3 (ideally 2 wet-lab + 1 dry-lab)
● I will present sequentially several scenarios/challenges
● You will have some minutes to think how you will tackle them
● I will propose approaches that worked for me
2
#1
3
The life of your sample
Experiment
(wet-lab domain)
Data
(digital domain)
File(s)
Results
(digital domain)
+File(s)
4
What is your sample?
Experiment
(wet-lab domain)
Data
(digital domain)
File(s)
Results
(digital domain)
+File(s)
5
What is your sample?
Experiment
(wet-lab domain)
Data
(digital domain)
File(s)
Results
(digital domain)
+File(s)
This is NOT enough
6
● Initial processing of the data
● Quality control
● Downstream analysis
● Reproducibility
● Data sharing and publication
Is all the information needed available?
7
● What information (aka. metadata) will describe your experiment?
● How will you collect metadata?
● Who will have access to metadata?
● Will metadata be future-proof?
Think
8
Collect systematically the metadata of the experiments
● Do it before processing the data
● Short and easy to complete
● Instantly accessible by authorized members of the team
● Easy to parse for humans and computers
9
#2
10
Experiments will happen over time
Time
Exp. 1
Untreated
ctrl.txt
Treated
t60.txt
Exp. 2
Treated
T60.txt
11
Which is your sample (and other issues)?
Untreated
ctrl.txt
Treated
t60.txt
Treated
T60.txt
? ?
What “*60.txt” file does correspond to each trated
experiment?
What “*60” and “ctrl” means may not be so obvious
and implies human interpretation whatsoever
Are both treated samples to be used with the same
untreated sample?
The variable use of lower/upper case complicates
computer searches
12
● How will you name your samples?
● Will it be really unique?
● Will it provide any information about the sample and/or group similar samples?
● Is it future-proof (i.e. consider more samples will come)?
● What will you label with the sample name (i.e. tubes, files)?
Think
13
● Simplest way: auto-incremental identifier (ID) (i.e. sample001, sample002, …)
● More complex options (sample ID based on metadata)
● Whichever you choose…
○ Unique
○ Computer-friendly (fixed length and pattern, all upper or lower case)
○ Anticipate the number of samples that can be reached
● Trace your sample with its ID through its life: from the tube to the files
Establish a system: each sample a unique identifier
14
The sample ID links metadata and data
15
#3
16
Where are data and results?
Experiment Data
File(s)
Results
+File(s)
17
Looking for Waldo is fun, Looking for files is NOT
18
● How will you organize your raw data?
● How will you organize your processed data?
● How will you organize your analysis results data?
● Will human and computer searches be easy?
Think
19
The life of your sample
Experiment
(wet-lab domain)
Data
(digital domain)
File(s)
Results
(digital domain)
+File(s)
20
The life of your sample
Experiment
(wet-lab domain)
Raw
Data
1
Analysis
results
3
21
2
Processed
data
(1) Raw data - 1 directory per instrument run
● Files as spit from the instrument
● Do not store modified, subsetted or merged files
● Quality control of raw files
22
(2) Processed data - 1 directory per sample
● Several subdirectories
○ Steps of the analysis pipeline
○ Logs of the programs used
○ File integrity verifications
● Subdirectories accommodate variations in the analysis pipelines
○ sample1/step1/program_a/sample1.txt
○ sample1/step1/program_b/sample1.txt
23
(3) Analysis results - projects and analysis directories
24
project_a
#4
25
Data analysis hardly ever is a one-time task
Experiment
(wet-lab domain)
Data
(digital domain)
File(s)
Results
(digital domain)
+File(s)
26
Can you process seamlessly multiple samples?
Time
ResultsData
Results
Results
Results
Results
Results
Data
...
27
● Imagine you write code to process/analyze 1 sample:
○ How will it handle 100 samples?
○ Will 100 samples be processed in a reasonable time?
○ Will you have to manually configure sample-specific parameters?
○ Will you be able to run specific parts of your code?
Think
28
Computer clusters to the rescue
29
Login
node
Computing
nodes
The naive first approach
30
Step
1
Configure
+
Execute
Step
2
Sample 1
And yet another sample…
31
Step
1
Configure
+
execute
Step
2
Sample 2
How long can you go like this?
32
Step
1
Configure
+
execute
Step
2
Sample 3
Your code’s wish-list
33
Scalability - 1 sample as easy as 100s
34
Step
1
Configure
+
execute
Step
2
Step
1
Step
2
Sample 1
Sample 2
Sample 3
Step
1
Step
2
Parallelization - run all samples at the same time
35
Step
1
Configure
+
execute
Step
2
Step
1
Step
2
Sample 1
Sample 2
Sample 3
Step
1
Step
2
Parallelization - speed up individual steps
36
Step
1
Configure
+
execute
Step
2
Step
1
Step
2
Sample 1
Sample 2
Sample 3
Step
1
Step
2
Step
1
Step
2
Step
1
Step
2
3 hours
3 hours
3 hours / 3 = 1 hour
Automatic configuration - no per-sample tuning
37
Step
1
Configure
+
execute
Step
2
Step
1
Step
2
Sample 1
Sample 2
Sample 3
Step
1
Step
2
Human
Mouse
Yeast
Automatic configuration - no per-sample tuning
38
Step
1
Configure
+
execute
Step
2
Step
1
Step
2
Sample 1
Sample 2
Sample 3
Step
1
Step
2
Metadata
Human
Mouse
Yeast
Modularity - execute it all or partially
39
OK
Configure
+
execute
Step
2
OK
Step
2
Sample 1
Sample 2
Sample 3
OK
Step
2
Modularity - execute it all or partially
40
Configure
+
execute
Step
2
Step
2
Sample 1
Sample 2
Sample 3
Step
2
#5
41
Data go through many procedures to generate results
Time
ResultsData
Results
Results
Results
Results
Results
Data
...
42
Can you or anybody else reproduce your results?
Results
Results
Results
Results
?
?
Little understanding, irreproducibility, identification of errors is harder
43
● How will you document your procedures?
● How will you store your code?
● How others will have access to your documentation?
Think
44
● Write in README files how and when software and accessory files are obtained
(e.g. genome reference sequence, annotation)
● Allocate a directory for any task (even as simple as sharing files)
● Code core analysis pipeline to log the output of the programs and verify file
integrity
● Document procedures using Markdown, Jupyter Notebooks, RStudio or alike
● Specify non-default variable values
Document, document and document
45
46
Take home message
What is your sample?
Which is your sample?
Where are data and results?
Can you processes
seamlessly multiple samples?
Can you or anybody else
reproduce your results?
47
Take home message
What is your sample?
Which is your sample?
Where are data and results?
Can you processes
seamlessly multiple samples?
Can you or anybody else
reproduce your results?
Collect systematically the metadata of the
experiments
48
Take home message
What is your sample?
Which is your sample?
Where are data and results?
Can you processes
seamlessly multiple samples?
Can you or anybody else
reproduce your results?
Collect systematically the metadata of the
experiments
Establish a system: each sample a unique identifier
49
Take home message
What is your sample?
Which is your sample?
Where are data and results?
Can you processes
seamlessly multiple samples?
Can you or anybody else
reproduce your results?
Collect systematically the metadata of the
experiments
Establish a system: each sample a unique identifier
Structured and hierarchical organization of the data
50
Take home message
What is your sample?
Which is your sample?
Where are data and results?
Can you processes
seamlessly multiple samples?
Can you or anybody else
reproduce your results?
Collect systematically the metadata of the
experiments
Establish a system: each sample a unique identifier
Structured and hierarchical organization of the data
Scalability, parallelization, automatic configuration and
modularity of the code
51
Take home message
What is your sample?
Which is your sample?
Where are data and results?
Can you processes
seamlessly multiple samples?
Can you or anybody else
reproduce your results?
Collect systematically the metadata of the
experiments
Establish a system: each sample a unique identifier
Structured and hierarchical organization of the data
Scalability, parallelization, automatic configuration and
modularity of the code
Document, document and document!
52
In case you forget the take home message…
The human factor is the greatest hurdle for reproducibility
Limit or control human intervention by automating every step of
the data analysis as much as possible
It’s not you, it’s the lab culture
53
Food for thought
54
Your involvement in the data analysis is a choice
The data analysis itself is not
55
Your involvement in the data analysis is a choice
The data analysis itself is not
56
Your
autonomy
Dependenceon
bioinformaticians
Your involvement in the data analysis
57
Thanks!
javier.quilez@crg.eu
Twitter: @jaquol
https://www.biorxiv.org/content/early/2017/08/29/136358
https://github.com/4DGenome/parallel_sequencing_lives
58

Weitere ähnliche Inhalte

Ähnlich wie Good practices (and challenges) for reproducibility

Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
Miklos Koren
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Shree Shree
 

Ähnlich wie Good practices (and challenges) for reproducibility (20)

Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Scaling Security Threat Detection with Apache Spark and Databricks
Scaling Security Threat Detection with Apache Spark and DatabricksScaling Security Threat Detection with Apache Spark and Databricks
Scaling Security Threat Detection with Apache Spark and Databricks
 
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdfCSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
Who go Types in my Systems Programing!
Who go Types in my Systems Programing!Who go Types in my Systems Programing!
Who go Types in my Systems Programing!
 
Applying soft computing techniques to corporate mobile security systems
Applying soft computing techniques to corporate mobile security systemsApplying soft computing techniques to corporate mobile security systems
Applying soft computing techniques to corporate mobile security systems
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
AI Orange Belt - Session 1
AI Orange Belt - Session 1AI Orange Belt - Session 1
AI Orange Belt - Session 1
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and Hadoop
 
EKON 23 Code_review_checklist
EKON 23 Code_review_checklistEKON 23 Code_review_checklist
EKON 23 Code_review_checklist
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 

Kürzlich hochgeladen

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 

Good practices (and challenges) for reproducibility

  • 1. Good practices (and challenges) for reproducibility “Give your samples a decent life” Javier Quilez
  • 2. Outline ● Make groups of 3 (ideally 2 wet-lab + 1 dry-lab) ● I will present sequentially several scenarios/challenges ● You will have some minutes to think how you will tackle them ● I will propose approaches that worked for me 2
  • 4. The life of your sample Experiment (wet-lab domain) Data (digital domain) File(s) Results (digital domain) +File(s) 4
  • 5. What is your sample? Experiment (wet-lab domain) Data (digital domain) File(s) Results (digital domain) +File(s) 5
  • 6. What is your sample? Experiment (wet-lab domain) Data (digital domain) File(s) Results (digital domain) +File(s) This is NOT enough 6
  • 7. ● Initial processing of the data ● Quality control ● Downstream analysis ● Reproducibility ● Data sharing and publication Is all the information needed available? 7
  • 8. ● What information (aka. metadata) will describe your experiment? ● How will you collect metadata? ● Who will have access to metadata? ● Will metadata be future-proof? Think 8
  • 9. Collect systematically the metadata of the experiments ● Do it before processing the data ● Short and easy to complete ● Instantly accessible by authorized members of the team ● Easy to parse for humans and computers 9
  • 10. #2 10
  • 11. Experiments will happen over time Time Exp. 1 Untreated ctrl.txt Treated t60.txt Exp. 2 Treated T60.txt 11
  • 12. Which is your sample (and other issues)? Untreated ctrl.txt Treated t60.txt Treated T60.txt ? ? What “*60.txt” file does correspond to each trated experiment? What “*60” and “ctrl” means may not be so obvious and implies human interpretation whatsoever Are both treated samples to be used with the same untreated sample? The variable use of lower/upper case complicates computer searches 12
  • 13. ● How will you name your samples? ● Will it be really unique? ● Will it provide any information about the sample and/or group similar samples? ● Is it future-proof (i.e. consider more samples will come)? ● What will you label with the sample name (i.e. tubes, files)? Think 13
  • 14. ● Simplest way: auto-incremental identifier (ID) (i.e. sample001, sample002, …) ● More complex options (sample ID based on metadata) ● Whichever you choose… ○ Unique ○ Computer-friendly (fixed length and pattern, all upper or lower case) ○ Anticipate the number of samples that can be reached ● Trace your sample with its ID through its life: from the tube to the files Establish a system: each sample a unique identifier 14
  • 15. The sample ID links metadata and data 15
  • 16. #3 16
  • 17. Where are data and results? Experiment Data File(s) Results +File(s) 17
  • 18. Looking for Waldo is fun, Looking for files is NOT 18
  • 19. ● How will you organize your raw data? ● How will you organize your processed data? ● How will you organize your analysis results data? ● Will human and computer searches be easy? Think 19
  • 20. The life of your sample Experiment (wet-lab domain) Data (digital domain) File(s) Results (digital domain) +File(s) 20
  • 21. The life of your sample Experiment (wet-lab domain) Raw Data 1 Analysis results 3 21 2 Processed data
  • 22. (1) Raw data - 1 directory per instrument run ● Files as spit from the instrument ● Do not store modified, subsetted or merged files ● Quality control of raw files 22
  • 23. (2) Processed data - 1 directory per sample ● Several subdirectories ○ Steps of the analysis pipeline ○ Logs of the programs used ○ File integrity verifications ● Subdirectories accommodate variations in the analysis pipelines ○ sample1/step1/program_a/sample1.txt ○ sample1/step1/program_b/sample1.txt 23
  • 24. (3) Analysis results - projects and analysis directories 24 project_a
  • 25. #4 25
  • 26. Data analysis hardly ever is a one-time task Experiment (wet-lab domain) Data (digital domain) File(s) Results (digital domain) +File(s) 26
  • 27. Can you process seamlessly multiple samples? Time ResultsData Results Results Results Results Results Data ... 27
  • 28. ● Imagine you write code to process/analyze 1 sample: ○ How will it handle 100 samples? ○ Will 100 samples be processed in a reasonable time? ○ Will you have to manually configure sample-specific parameters? ○ Will you be able to run specific parts of your code? Think 28
  • 29. Computer clusters to the rescue 29 Login node Computing nodes
  • 30. The naive first approach 30 Step 1 Configure + Execute Step 2 Sample 1
  • 31. And yet another sample… 31 Step 1 Configure + execute Step 2 Sample 2
  • 32. How long can you go like this? 32 Step 1 Configure + execute Step 2 Sample 3
  • 34. Scalability - 1 sample as easy as 100s 34 Step 1 Configure + execute Step 2 Step 1 Step 2 Sample 1 Sample 2 Sample 3 Step 1 Step 2
  • 35. Parallelization - run all samples at the same time 35 Step 1 Configure + execute Step 2 Step 1 Step 2 Sample 1 Sample 2 Sample 3 Step 1 Step 2
  • 36. Parallelization - speed up individual steps 36 Step 1 Configure + execute Step 2 Step 1 Step 2 Sample 1 Sample 2 Sample 3 Step 1 Step 2 Step 1 Step 2 Step 1 Step 2 3 hours 3 hours 3 hours / 3 = 1 hour
  • 37. Automatic configuration - no per-sample tuning 37 Step 1 Configure + execute Step 2 Step 1 Step 2 Sample 1 Sample 2 Sample 3 Step 1 Step 2 Human Mouse Yeast
  • 38. Automatic configuration - no per-sample tuning 38 Step 1 Configure + execute Step 2 Step 1 Step 2 Sample 1 Sample 2 Sample 3 Step 1 Step 2 Metadata Human Mouse Yeast
  • 39. Modularity - execute it all or partially 39 OK Configure + execute Step 2 OK Step 2 Sample 1 Sample 2 Sample 3 OK Step 2
  • 40. Modularity - execute it all or partially 40 Configure + execute Step 2 Step 2 Sample 1 Sample 2 Sample 3 Step 2
  • 41. #5 41
  • 42. Data go through many procedures to generate results Time ResultsData Results Results Results Results Results Data ... 42
  • 43. Can you or anybody else reproduce your results? Results Results Results Results ? ? Little understanding, irreproducibility, identification of errors is harder 43
  • 44. ● How will you document your procedures? ● How will you store your code? ● How others will have access to your documentation? Think 44
  • 45. ● Write in README files how and when software and accessory files are obtained (e.g. genome reference sequence, annotation) ● Allocate a directory for any task (even as simple as sharing files) ● Code core analysis pipeline to log the output of the programs and verify file integrity ● Document procedures using Markdown, Jupyter Notebooks, RStudio or alike ● Specify non-default variable values Document, document and document 45
  • 46. 46
  • 47. Take home message What is your sample? Which is your sample? Where are data and results? Can you processes seamlessly multiple samples? Can you or anybody else reproduce your results? 47
  • 48. Take home message What is your sample? Which is your sample? Where are data and results? Can you processes seamlessly multiple samples? Can you or anybody else reproduce your results? Collect systematically the metadata of the experiments 48
  • 49. Take home message What is your sample? Which is your sample? Where are data and results? Can you processes seamlessly multiple samples? Can you or anybody else reproduce your results? Collect systematically the metadata of the experiments Establish a system: each sample a unique identifier 49
  • 50. Take home message What is your sample? Which is your sample? Where are data and results? Can you processes seamlessly multiple samples? Can you or anybody else reproduce your results? Collect systematically the metadata of the experiments Establish a system: each sample a unique identifier Structured and hierarchical organization of the data 50
  • 51. Take home message What is your sample? Which is your sample? Where are data and results? Can you processes seamlessly multiple samples? Can you or anybody else reproduce your results? Collect systematically the metadata of the experiments Establish a system: each sample a unique identifier Structured and hierarchical organization of the data Scalability, parallelization, automatic configuration and modularity of the code 51
  • 52. Take home message What is your sample? Which is your sample? Where are data and results? Can you processes seamlessly multiple samples? Can you or anybody else reproduce your results? Collect systematically the metadata of the experiments Establish a system: each sample a unique identifier Structured and hierarchical organization of the data Scalability, parallelization, automatic configuration and modularity of the code Document, document and document! 52
  • 53. In case you forget the take home message… The human factor is the greatest hurdle for reproducibility Limit or control human intervention by automating every step of the data analysis as much as possible It’s not you, it’s the lab culture 53
  • 55. Your involvement in the data analysis is a choice The data analysis itself is not 55
  • 56. Your involvement in the data analysis is a choice The data analysis itself is not 56 Your autonomy Dependenceon bioinformaticians Your involvement in the data analysis
  • 57. 57