SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Using research software in a
production environment
Morgan Taschuk @morgantaschuk
Senior Manager, Genome Sequence Informatics
Ontario Institute for Cancer Research
ONTARIO INSTITUTE FOR CANCER RESEARCH
Genome Sequence Informatics
2
• Primary Analysis and
QC at OICR
• 8100 cores
• 2 petabytes of disk
• Support dozens of
projects, 100s
publications
• Half bioinformaticians
• Half software
developers/engineers
est 2008
ONTARIO INSTITUTE FOR CANCER RESEARCH
Core process
3
ONTARIO INSTITUTE FOR CANCER RESEARCH
We are consumers of research software
4
ONTARIO INSTITUTE FOR CANCER RESEARCH
5
https://doi.org/10.1371/journal.pcbi.1005412
Good software is not enough
ONTARIO INSTITUTE FOR CANCER RESEARCH
Big Data
7
Scale: 1 sequenced human whole genome is
between 30-45 GB
• Genomics England’s 100 000 Genomes Project will
take ~20 PB of disk to store
• Need to sequence between 5000-20 000 cases to
confidently link rare variants with disease
ONTARIO INSTITUTE FOR CANCER RESEARCH
Data is too big!!
8
Costs of whole genome sequencing (grey line) and computer power
(Moore law, black line).
Clinical and Translational Radiation Oncology 2017 3, 16-20DOI: (10.1016/j.ctro.2017.03.002)
ONTARIO INSTITUTE FOR CANCER RESEARCH
Translation to the clinic
9
• Only 10-25% of research is
able to be translated into
clinical practice
• Example: Recommended
laboratory test turnaround
time is 14 days
• Genomics test results
between biopsy and results
~35 days
Aung et al. Clin Cancer Res. 2018 doi: 10.1158/1078-0432.
ONTARIO INSTITUTE FOR CANCER RESEARCH
Growing pains
10
OICR acquires a
lot of sequencing
instruments
ONTARIO INSTITUTE FOR CANCER RESEARCH
11
ONTARIO INSTITUTE FOR CANCER RESEARCH
12
ONTARIO INSTITUTE FOR CANCER RESEARCH
GSI In 2017
13
• 17 staff but only ~2 monitor this system
• 90,098 analysis workflows executed on human
whole genome, exome, targeted panels, and RNA
sequencing
• 1 successful workflow every 6 minutes
• Vast majority of data never needs human
intervention
• My goal is/was to reduce turnaround time… stay
tuned for the end of the talk
ONTARIO INSTITUTE FOR CANCER RESEARCH
Our Current Approach
14
1. Nothing should be on fire
ONTARIO INSTITUTE FOR CANCER RESEARCH
Our Current Approach
15
1. Control our inputs (data and metadata)
2. As little human intervention as possible
3. Fail fast, fail loudly
4. Totally traceable and reproducible
ONTARIO INSTITUTE FOR CANCER RESEARCH
Total assimilation
16
• Borg’ed out on supply chain management
• Assimilate all aspects of metadata and data
management to ensure consistent quality
Caveat
ONTARIO INSTITUTE FOR CANCER RESEARCH
Monitoring
Our Approach
Valid
metadata
Workflow
system
Automation
Genomics
Reports
HPC
Research
Software
Valid metadata entering an automated system running on
robust software with reproducible results - and everything
tracked and monitored.
ONTARIO INSTITUTE FOR CANCER RESEARCH
Total assimilation
19
Valid
metadata
Workflow
system
Automation
Reports/
Data
Genomics
SCIENCE!!
ONTARIO INSTITUTE FOR CANCER RESEARCH
Only good metadata enter
20
• Control and validate
metadata as far
upstream as we can
• Laboratory Information
Management System
(LIMS)
ONTARIO INSTITUTE FOR CANCER RESEARCH
MISO LIMS as the metadata solution
21
ONTARIO INSTITUTE FOR CANCER RESEARCH
MISO as the metadata solution
22
• Since 2017, MISO LIMS
• open source
• completely customizable
• Validate data at entry
• Sanity checks
• Reduce data entry and thus reduce data entry
errors
https://github.com/TGAC/miso-lims
ONTARIO INSTITUTE FOR CANCER RESEARCH
Automation
23
• Deciders:
• take in metadata and data
• decide what analysis to perform using rules (if-
then; map-reduce; etc)
• check whether data has previously been
analyzed
• if system is at capacity
• Difficult to write
• especially when metadata is poor
• software needs to understand all metadata
ONTARIO INSTITUTE FOR CANCER RESEARCH
Monitoring
24
• Track everything before you need it
• Silence on success
• but make sure you detect when systems go
offline!
• Dashboards and tickets instead of emails
• Fail fast, fail loudly
ONTARIO INSTITUTE FOR CANCER RESEARCH
How machines are performing...
25
ONTARIO INSTITUTE FOR CANCER RESEARCH
26
Whether I should worry about disk...
ONTARIO INSTITUTE FOR CANCER RESEARCH
Tickets and alerts instead of emails
27
Automatic of course
ONTARIO INSTITUTE FOR CANCER RESEARCH
Workflows
28
• Workflow systems:
• takes in input data and parameters
• runs the data through analysis steps
• produces data
• Analysis steps:
• Good research software
• Absolutely critical and integral to all other
systems discussed so far
ONTARIO INSTITUTE FOR CANCER RESEARCH
Having good software is not enough
29
Monitoring
Metadata validation
Automation
Workflow systems
software
ONTARIO INSTITUTE FOR CANCER RESEARCH
Turnaround time
30
• Sequencing to alignment has dropped from about
20 days to 7 days for Hiseq whole genome lanes
• Anecdotal: Variability reduced, hands-on time
reduced
ONTARIO INSTITUTE FOR CANCER RESEARCH
Current/future work
31
• Automation
• make it simpler
• more complete
• (never going to be done)
• Research is a changing field by nature
• Flexibility versus robustness
• Hot new things: sc-seq, ct-seq, immuno-onco-
genomics
• Underlying assumptions change over time
ONTARIO INSTITUTE FOR CANCER RESEARCH
We’re investing in good infrastructure
32
Turonno
!
entry-level!
Look for GSI!
Software dev!
report to me!!
Apply! http://bit.ly/oicr-gsi-dev
ONTARIO INSTITUTE FOR CANCER RESEARCH
Conclusions
33
• The FUTURE is
• hundreds of thousands of samples
• expediting clinical results
• no loss of reproducibility or quality
• Everyone needs a little production-style
infrastructure, even if you’re not production
• control your metadata!
• automate!
• standardize your analysis!
• monitor all the things!
ONTARIO INSTITUTE FOR CANCER RESEARCH
Acknowledgements
34
• Lars Jorgensen
• Lawrence Heisler
• Michael Laszloffy
• Heather Armstrong
• Dillan Cooke
• Andre Masella
• Iain Bancarz
• Timothy Beck
• Peter Ruzanov
• Prisni Rath
• Jonathan Torchia
• Richard Jovelin
• Yogi Sundaravadanam
• Xuemei Luo
• Many excellent co-op
students
To past and current GSI members.
OICR Technology Programs enable cancer research in
Ontario by providing value-added expertise, training and
access to high-end infrastructure and technologies.
Find out more at oicr.on.ca
This project was supported by the
OICR Adaptive Oncology Program
Funding for the Ontario Institute for Cancer Research
is provided by the Government of Ontario
ONTARIO INSTITUTE FOR CANCER RESEARCH
Attributions
37
Jensflorian CC BY-SA 3.0
Timothy Dilich - Noun Project, CC0
http://andrewjrobinson.github.io/training_docs/tutorials/variant_calling_galaxy_1/variant_calling_galaxy_1/
By David pogrebeshsky [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)],
from Wikimedia Commons
Star Trek ® Paramount Pictures
ONTARIO INSTITUTE FOR CANCER RESEARCH
GSI on the web
38
https://github.com/oicr-gsi
https://gsi.oicr.on.ca

Weitere ähnliche Inhalte

Was ist angesagt?

GxAlert for Real-time Management and Strengthening of Remote GeneXpert Networ...
GxAlert for Real-time Management and Strengthening of Remote GeneXpert Networ...GxAlert for Real-time Management and Strengthening of Remote GeneXpert Networ...
GxAlert for Real-time Management and Strengthening of Remote GeneXpert Networ...SystemOne
 
Gx alert casestudy bangladesh_062618
Gx alert casestudy bangladesh_062618Gx alert casestudy bangladesh_062618
Gx alert casestudy bangladesh_062618SystemOne
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biologyPranavathiyani G
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith
 
Pharma data analytics
Pharma data analyticsPharma data analytics
Pharma data analyticsAxon Lawyers
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 
Se application notes_portabledevices_safesecurity
Se application notes_portabledevices_safesecuritySe application notes_portabledevices_safesecurity
Se application notes_portabledevices_safesecurityAnne Stiegler
 
Computer for Biological Research
Computer for Biological ResearchComputer for Biological Research
Computer for Biological ResearchChakard Chalayut
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowEagle Genomics
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsPragya Pai
 
Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Big Data Provides Opportunities, Challenges and a Better Future in Health and...Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Big Data Provides Opportunities, Challenges and a Better Future in Health and...Cirdan
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
Robotic Intestine-Exploring Pill
Robotic Intestine-Exploring PillRobotic Intestine-Exploring Pill
Robotic Intestine-Exploring Pillmfeygin
 
Tandem connectionist anomaly detection: Use of faulty vibration signals in fe...
Tandem connectionist anomaly detection: Use of faulty vibration signals in fe...Tandem connectionist anomaly detection: Use of faulty vibration signals in fe...
Tandem connectionist anomaly detection: Use of faulty vibration signals in fe...pcl-lab
 
GxAlert Papua New Guinea Case Study 072518
GxAlert Papua New Guinea Case Study 072518GxAlert Papua New Guinea Case Study 072518
GxAlert Papua New Guinea Case Study 072518SystemOne
 
Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Ola Spjuth
 

Was ist angesagt? (19)

2016 iHT2 San Diego Health IT Summit
2016 iHT2 San Diego Health IT Summit2016 iHT2 San Diego Health IT Summit
2016 iHT2 San Diego Health IT Summit
 
GxAlert for Real-time Management and Strengthening of Remote GeneXpert Networ...
GxAlert for Real-time Management and Strengthening of Remote GeneXpert Networ...GxAlert for Real-time Management and Strengthening of Remote GeneXpert Networ...
GxAlert for Real-time Management and Strengthening of Remote GeneXpert Networ...
 
Gx alert casestudy bangladesh_062618
Gx alert casestudy bangladesh_062618Gx alert casestudy bangladesh_062618
Gx alert casestudy bangladesh_062618
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
Pharma data analytics
Pharma data analyticsPharma data analytics
Pharma data analytics
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
Se application notes_portabledevices_safesecurity
Se application notes_portabledevices_safesecuritySe application notes_portabledevices_safesecurity
Se application notes_portabledevices_safesecurity
 
Cancer uk 2015_module1_ouellette_ver02
Cancer uk 2015_module1_ouellette_ver02Cancer uk 2015_module1_ouellette_ver02
Cancer uk 2015_module1_ouellette_ver02
 
Computer for Biological Research
Computer for Biological ResearchComputer for Biological Research
Computer for Biological Research
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflow
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Big Data Provides Opportunities, Challenges and a Better Future in Health and...Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Big Data Provides Opportunities, Challenges and a Better Future in Health and...
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Robotic Intestine-Exploring Pill
Robotic Intestine-Exploring PillRobotic Intestine-Exploring Pill
Robotic Intestine-Exploring Pill
 
Tandem connectionist anomaly detection: Use of faulty vibration signals in fe...
Tandem connectionist anomaly detection: Use of faulty vibration signals in fe...Tandem connectionist anomaly detection: Use of faulty vibration signals in fe...
Tandem connectionist anomaly detection: Use of faulty vibration signals in fe...
 
GxAlert Papua New Guinea Case Study 072518
GxAlert Papua New Guinea Case Study 072518GxAlert Papua New Guinea Case Study 072518
GxAlert Papua New Guinea Case Study 072518
 
Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...
 
An Introduction to Biology with Computers
An Introduction to Biology with ComputersAn Introduction to Biology with Computers
An Introduction to Biology with Computers
 

Ähnlich wie Using research software in a production environment

Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use casesGuy Coates
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
Trans disciplinary research is a must for excellence in science by Prof. Moha...
Trans disciplinary research is a must for excellence in science by Prof. Moha...Trans disciplinary research is a must for excellence in science by Prof. Moha...
Trans disciplinary research is a must for excellence in science by Prof. Moha...Prof. Mohamed Labib Salem
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Pistoia Alliance
 
NS1450X - Computerized Systems in Clinical Research
NS1450X - Computerized Systems in Clinical ResearchNS1450X - Computerized Systems in Clinical Research
NS1450X - Computerized Systems in Clinical ResearchJudson Chase
 
Digital Biomarkers for Huntington Disease
Digital Biomarkers for Huntington DiseaseDigital Biomarkers for Huntington Disease
Digital Biomarkers for Huntington DiseaseHuntington Study Group
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Neuro, McGill University
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECAProject
 
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Robert (Rob) Salomon
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
LIMS in Modern Molecular Pathology by Dr. Perry Maxwell
LIMS in Modern Molecular Pathology by Dr. Perry MaxwellLIMS in Modern Molecular Pathology by Dr. Perry Maxwell
LIMS in Modern Molecular Pathology by Dr. Perry MaxwellCirdan
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI ConferenceMegan Sawchuk
 
Data is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesData is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesAboul Ella Hassanien
 
Data are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesData are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesAboul Ella Hassanien
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centreJisc
 

Ähnlich wie Using research software in a production environment (20)

Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Trans disciplinary research is a must for excellence in science by Prof. Moha...
Trans disciplinary research is a must for excellence in science by Prof. Moha...Trans disciplinary research is a must for excellence in science by Prof. Moha...
Trans disciplinary research is a must for excellence in science by Prof. Moha...
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
 
NS1450X - Computerized Systems in Clinical Research
NS1450X - Computerized Systems in Clinical ResearchNS1450X - Computerized Systems in Clinical Research
NS1450X - Computerized Systems in Clinical Research
 
Digital Biomarkers for Huntington Disease
Digital Biomarkers for Huntington DiseaseDigital Biomarkers for Huntington Disease
Digital Biomarkers for Huntington Disease
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
 
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
LIMS in Modern Molecular Pathology by Dr. Perry Maxwell
LIMS in Modern Molecular Pathology by Dr. Perry MaxwellLIMS in Modern Molecular Pathology by Dr. Perry Maxwell
LIMS in Modern Molecular Pathology by Dr. Perry Maxwell
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
Data is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesData is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniques
 
Data are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesData are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniques
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centre
 

Mehr von Morgan Taschuk

The Prisoner's Dilemma and Social Networks
The Prisoner's Dilemma and Social NetworksThe Prisoner's Dilemma and Social Networks
The Prisoner's Dilemma and Social NetworksMorgan Taschuk
 
Synthetic lethal interactions in yeast
Synthetic lethal interactions in yeastSynthetic lethal interactions in yeast
Synthetic lethal interactions in yeastMorgan Taschuk
 
Similarities and Differences
Similarities and DifferencesSimilarities and Differences
Similarities and DifferencesMorgan Taschuk
 
Newcastle iGEM Presentation 2009
Newcastle iGEM Presentation 2009Newcastle iGEM Presentation 2009
Newcastle iGEM Presentation 2009Morgan Taschuk
 
Newcastle iGEM Presentation 2008
Newcastle iGEM Presentation 2008Newcastle iGEM Presentation 2008
Newcastle iGEM Presentation 2008Morgan Taschuk
 
Trials and Tribulations of a First Year iGEM Team
Trials and Tribulations of a First Year iGEM TeamTrials and Tribulations of a First Year iGEM Team
Trials and Tribulations of a First Year iGEM TeamMorgan Taschuk
 

Mehr von Morgan Taschuk (6)

The Prisoner's Dilemma and Social Networks
The Prisoner's Dilemma and Social NetworksThe Prisoner's Dilemma and Social Networks
The Prisoner's Dilemma and Social Networks
 
Synthetic lethal interactions in yeast
Synthetic lethal interactions in yeastSynthetic lethal interactions in yeast
Synthetic lethal interactions in yeast
 
Similarities and Differences
Similarities and DifferencesSimilarities and Differences
Similarities and Differences
 
Newcastle iGEM Presentation 2009
Newcastle iGEM Presentation 2009Newcastle iGEM Presentation 2009
Newcastle iGEM Presentation 2009
 
Newcastle iGEM Presentation 2008
Newcastle iGEM Presentation 2008Newcastle iGEM Presentation 2008
Newcastle iGEM Presentation 2008
 
Trials and Tribulations of a First Year iGEM Team
Trials and Tribulations of a First Year iGEM TeamTrials and Tribulations of a First Year iGEM Team
Trials and Tribulations of a First Year iGEM Team
 

Kürzlich hochgeladen

User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 

Kürzlich hochgeladen (20)

User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 

Using research software in a production environment

  • 1. Using research software in a production environment Morgan Taschuk @morgantaschuk Senior Manager, Genome Sequence Informatics Ontario Institute for Cancer Research
  • 2. ONTARIO INSTITUTE FOR CANCER RESEARCH Genome Sequence Informatics 2 • Primary Analysis and QC at OICR • 8100 cores • 2 petabytes of disk • Support dozens of projects, 100s publications • Half bioinformaticians • Half software developers/engineers est 2008
  • 3. ONTARIO INSTITUTE FOR CANCER RESEARCH Core process 3
  • 4. ONTARIO INSTITUTE FOR CANCER RESEARCH We are consumers of research software 4
  • 5. ONTARIO INSTITUTE FOR CANCER RESEARCH 5 https://doi.org/10.1371/journal.pcbi.1005412
  • 6. Good software is not enough
  • 7. ONTARIO INSTITUTE FOR CANCER RESEARCH Big Data 7 Scale: 1 sequenced human whole genome is between 30-45 GB • Genomics England’s 100 000 Genomes Project will take ~20 PB of disk to store • Need to sequence between 5000-20 000 cases to confidently link rare variants with disease
  • 8. ONTARIO INSTITUTE FOR CANCER RESEARCH Data is too big!! 8 Costs of whole genome sequencing (grey line) and computer power (Moore law, black line). Clinical and Translational Radiation Oncology 2017 3, 16-20DOI: (10.1016/j.ctro.2017.03.002)
  • 9. ONTARIO INSTITUTE FOR CANCER RESEARCH Translation to the clinic 9 • Only 10-25% of research is able to be translated into clinical practice • Example: Recommended laboratory test turnaround time is 14 days • Genomics test results between biopsy and results ~35 days Aung et al. Clin Cancer Res. 2018 doi: 10.1158/1078-0432.
  • 10. ONTARIO INSTITUTE FOR CANCER RESEARCH Growing pains 10 OICR acquires a lot of sequencing instruments
  • 11. ONTARIO INSTITUTE FOR CANCER RESEARCH 11
  • 12. ONTARIO INSTITUTE FOR CANCER RESEARCH 12
  • 13. ONTARIO INSTITUTE FOR CANCER RESEARCH GSI In 2017 13 • 17 staff but only ~2 monitor this system • 90,098 analysis workflows executed on human whole genome, exome, targeted panels, and RNA sequencing • 1 successful workflow every 6 minutes • Vast majority of data never needs human intervention • My goal is/was to reduce turnaround time… stay tuned for the end of the talk
  • 14. ONTARIO INSTITUTE FOR CANCER RESEARCH Our Current Approach 14 1. Nothing should be on fire
  • 15. ONTARIO INSTITUTE FOR CANCER RESEARCH Our Current Approach 15 1. Control our inputs (data and metadata) 2. As little human intervention as possible 3. Fail fast, fail loudly 4. Totally traceable and reproducible
  • 16. ONTARIO INSTITUTE FOR CANCER RESEARCH Total assimilation 16 • Borg’ed out on supply chain management • Assimilate all aspects of metadata and data management to ensure consistent quality
  • 18. ONTARIO INSTITUTE FOR CANCER RESEARCH Monitoring Our Approach Valid metadata Workflow system Automation Genomics Reports HPC Research Software Valid metadata entering an automated system running on robust software with reproducible results - and everything tracked and monitored.
  • 19. ONTARIO INSTITUTE FOR CANCER RESEARCH Total assimilation 19 Valid metadata Workflow system Automation Reports/ Data Genomics SCIENCE!!
  • 20. ONTARIO INSTITUTE FOR CANCER RESEARCH Only good metadata enter 20 • Control and validate metadata as far upstream as we can • Laboratory Information Management System (LIMS)
  • 21. ONTARIO INSTITUTE FOR CANCER RESEARCH MISO LIMS as the metadata solution 21
  • 22. ONTARIO INSTITUTE FOR CANCER RESEARCH MISO as the metadata solution 22 • Since 2017, MISO LIMS • open source • completely customizable • Validate data at entry • Sanity checks • Reduce data entry and thus reduce data entry errors https://github.com/TGAC/miso-lims
  • 23. ONTARIO INSTITUTE FOR CANCER RESEARCH Automation 23 • Deciders: • take in metadata and data • decide what analysis to perform using rules (if- then; map-reduce; etc) • check whether data has previously been analyzed • if system is at capacity • Difficult to write • especially when metadata is poor • software needs to understand all metadata
  • 24. ONTARIO INSTITUTE FOR CANCER RESEARCH Monitoring 24 • Track everything before you need it • Silence on success • but make sure you detect when systems go offline! • Dashboards and tickets instead of emails • Fail fast, fail loudly
  • 25. ONTARIO INSTITUTE FOR CANCER RESEARCH How machines are performing... 25
  • 26. ONTARIO INSTITUTE FOR CANCER RESEARCH 26 Whether I should worry about disk...
  • 27. ONTARIO INSTITUTE FOR CANCER RESEARCH Tickets and alerts instead of emails 27 Automatic of course
  • 28. ONTARIO INSTITUTE FOR CANCER RESEARCH Workflows 28 • Workflow systems: • takes in input data and parameters • runs the data through analysis steps • produces data • Analysis steps: • Good research software • Absolutely critical and integral to all other systems discussed so far
  • 29. ONTARIO INSTITUTE FOR CANCER RESEARCH Having good software is not enough 29 Monitoring Metadata validation Automation Workflow systems software
  • 30. ONTARIO INSTITUTE FOR CANCER RESEARCH Turnaround time 30 • Sequencing to alignment has dropped from about 20 days to 7 days for Hiseq whole genome lanes • Anecdotal: Variability reduced, hands-on time reduced
  • 31. ONTARIO INSTITUTE FOR CANCER RESEARCH Current/future work 31 • Automation • make it simpler • more complete • (never going to be done) • Research is a changing field by nature • Flexibility versus robustness • Hot new things: sc-seq, ct-seq, immuno-onco- genomics • Underlying assumptions change over time
  • 32. ONTARIO INSTITUTE FOR CANCER RESEARCH We’re investing in good infrastructure 32 Turonno ! entry-level! Look for GSI! Software dev! report to me!! Apply! http://bit.ly/oicr-gsi-dev
  • 33. ONTARIO INSTITUTE FOR CANCER RESEARCH Conclusions 33 • The FUTURE is • hundreds of thousands of samples • expediting clinical results • no loss of reproducibility or quality • Everyone needs a little production-style infrastructure, even if you’re not production • control your metadata! • automate! • standardize your analysis! • monitor all the things!
  • 34. ONTARIO INSTITUTE FOR CANCER RESEARCH Acknowledgements 34 • Lars Jorgensen • Lawrence Heisler • Michael Laszloffy • Heather Armstrong • Dillan Cooke • Andre Masella • Iain Bancarz • Timothy Beck • Peter Ruzanov • Prisni Rath • Jonathan Torchia • Richard Jovelin • Yogi Sundaravadanam • Xuemei Luo • Many excellent co-op students To past and current GSI members.
  • 35. OICR Technology Programs enable cancer research in Ontario by providing value-added expertise, training and access to high-end infrastructure and technologies. Find out more at oicr.on.ca This project was supported by the OICR Adaptive Oncology Program
  • 36. Funding for the Ontario Institute for Cancer Research is provided by the Government of Ontario
  • 37. ONTARIO INSTITUTE FOR CANCER RESEARCH Attributions 37 Jensflorian CC BY-SA 3.0 Timothy Dilich - Noun Project, CC0 http://andrewjrobinson.github.io/training_docs/tutorials/variant_calling_galaxy_1/variant_calling_galaxy_1/ By David pogrebeshsky [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], from Wikimedia Commons Star Trek ® Paramount Pictures
  • 38. ONTARIO INSTITUTE FOR CANCER RESEARCH GSI on the web 38 https://github.com/oicr-gsi https://gsi.oicr.on.ca