SlideShare ist ein Scribd-Unternehmen logo
1 von 61
How Can We Make Genomic
Epidemiology a Widespread Reality?
William Hsiao, Ph.D.
William.hsiao@bccdc.ca
@wlhsiao
BC Public Health Microbiology and Reference Laboratory
BCCDC Grand Round May 26 2015
Outline
• Part 1: What is genomic epidemiology and
Why is it important for public health
microbiology
• Part 2: What are the requirements to bring
genomic epidemiology to routine public
health practice
– Introducing our project IRIDA as part of the
solution
3
Source: Peter Gleick, Scienceblogs.com
People
Place
Time
Source: Melanie Courtot
People
Place
Time
Source: Melanie Courtot
People
Place
Time
Source: Melanie Courtot
Molecular Epidemiology
• Laboratory generated biomarker results can
be correlated to epidemiological investigations
(People, Place, Time)
• Provides linkage based on common exposure
to the same pathogen at the molecular level
• Most tests detect one or a few of specific
biomarkers, representing a fraction of the
pathogens’ genetic information
Current Methods of Characterizing Foodborne
Pathogens in a Public Health Laboratory
• Growth characteristics
• Phenotypic panels
• Agglutination reactions
• Enzyme immuno assays (EIAs)
• PCR
• DNA arrays (hybridization)
• Sanger sequencing of marker genes
• DNA restriction
• Electrophoresis (PFGE, capillary)
Each pathogen is characterized by methods that are specific to that pathogen in
multiple workflows (separate workflows for each pathogen) TAT: 5 min – weeks
(months)
Source: Rebecca Lindsey
Genomic Epidemiology
Def: Using whole genome sequencing data from
pathogens and epidemiological investigations
to track spread of an infectious disease
Why Genomic Epidemiology
• One technology (DNA sequencing) compatible with
many types of pathogens
• Capable of generating 10-1000s of high quality
pathogen genomes within 1-7 days
Sequencing = lots of HQ Data
• Capture the pathogen’s entire genetic makeup
• Unbiased (~97-99+% of the genome captured using
common sequencing approaches)
• Significantly more data than traditional methods
• Allow higher resolution and higher sensitivity methods to
be applied
• Allow value-added
evolutionary & Functional
study of the pathogens
– Virulence factors
– AMR genes
$10K per human genome or $10
per bacterial genome
$100M per human genome
Sequencing cost continues to drop
Variations in genomes = Basis of
Comparison
• Mutations
– Point mutations
– Small insertions and deletion (indels)
– Can change functions of a gene
• Recombination, deletion, and duplication
– Rearrange genes, can change expression
– Increase gene copy number
– Delete genes
• Horizontal gene transfer
– Acquiring genetic material from non-parental organism
• E.g. Antibiotic resistance / new toxins
SNP Analysis
• What is a SNP?
– A SNP (single nucleotide polymorphism) is DNA
sequence variation occurring when a single nucleotide
differs between two or more genomes
ATCGCGATATCATACGG
ATCGCAATATCATACGG
ATCGCGATATCATACGG
ATCGCGATATCATACGG
ATCGCAATATCATACGG
• SNP can be created from point mutation but can
also be created from insertion and deletion of
one nucleotide
Why are SNPs useful
• Silent mutations that do not change protein
sequences happen quite frequently due to
DNA replication errors => High Resolution
• SNPs occurs across the whole genome and can
be detected from whole genome sequencing
=> Unbiased markers
• SNPs can also be used to infer phylogeny of
organisms
– More shared SNPs = more closely related
SNP Minimal Spanning Tree – colored by Phage Type
PT8
PT4
PT13a
PT52
The most similar isolates are connected first => clustering them together
SNP Minimal Spanning Tree – colored by outbreaks
Many phylogenetic trees based on SNPs
published to show clustering of outbreak cases
den Bakker et al Emerg Infect Dis. 2014 Aug;20(8)
Non-related
cases
Outbreak
cases
Allard, M et alPLoS ONE 8 (1) 2013
Forces Driving Pathogen Genome Evolution
Specialization
“lean and mean”
New
function can
be derived
through:
Gene expression
and be turned on
and off
Intra-cluster distances overlap with inter-cluster
distances
Leekitcharoenphon, et al. 2014. PLoS ONE 9 (2). doi:10.1371/journal.pone.0087991.
Different species have different clustering
distances
Leekitcharoenphon, et al. 2014. PLoS ONE 9 (2). doi:10.1371/journal.pone.0087991.
Genomics + Epidemiology
• Having genetic distance information alone
may not be enough to fully characterize
outbreaks
• Need to combine with epidemiological
investigations
• Using known clusters to establish (sub-
)species-specific genetic distance criteria
• Genomics can help connecting previous
unlinked cases to uncover new cases
Each year, one in eight Canadians (or
four million people)
get sick with a domestically acquired
food-borne illness.
http://www.phac-aspc.gc.ca/efwd-emoha/efbi-emoa-eng.php
Whole Genome Sequencing of Foodborne
Pathogens Around the World
• UK Public Health England committed to sequence all the
Salmonella isolates submitted to PH Lab
• US FDA and CDC (supported by National Center for
Biotechnology Information) created a distributed network
of labs to utilize WGS for pathogen identification
https://publichealthmatters.blog.gov.uk/2014/01/20/innovations-in-genomic-sequencing/
http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/ucm363134.htm
Genome Canada Bioinformatics Competition: Large-Scale Project
“A Federated Bioinformatics Platform for
Public Health Microbial Genomics”
Our Goal
The IRIDA platform
(Integrated Rapid Infectious Disease Analysis)
An open source, standards compliant, high quality genomic epidemiology
analysis platform based on web-technology to support real-time (food-
borne) disease outbreak investigations
25 www.IRIDA.ca
Partnership among public health agencies and academic institutes to bridge the gaps
between advancements in genomic epidemiology and application to real-life and real-
time use cases in public health agencies
- Project Team has direct access to state of the art research in academia
- Project Team is directly embedded in user organization
National
Public Health Agency
Provincial
Public Health Agency
Academic/Public
IRIDA Project Phases
• Phase 1: genomics process and analysis pipeline to
produce categorical data (MLST and SNPs) suitable for
current epidemiological analysis – almost completed
• Phase 2: combine the categorical data with
epidemiological data (line list approach to replace
current Excel based approach) – in progress
• Phase 3: Develop IRIDA as an exploratory platform for
new ways of interpreting genomics data in light of
epidemiological and clinical data – in progress;
continuous process beyond current project
Interviews with key personnel to identify
barriers to implement genomic epidemiology in
public health agencies
28
GAP 1: PUBLIC HEALTH PERSONNEL
LACK TRAINING IN GENOMICS
Microbial genomics has been a valuable
research tool
• Help us understand:
– microbial evolution
– pathogenesis
– create novel industrial processes
– create new laboratory tests
• Use historical isolates – not real time
• Use of laboratory strains – no associated rich
clinical and epidemiological metadata
Cultural and Practical Differences
Genomics Research Laboratory Genomics Diagnostic Laboratory
Curiosity driven Production / Case driven
Exploratory analysis tolerated Exploratory analysis discouraged
Reproducibility = other labs’ problem Reproducibility critical
Tweaking protocols desirable Stability in protocols desirable
Protocols don’t need to be validated Protocols need to be validated
Novelty justifies the high cost of
experiment
Conscious of cost per unit test; tests need
to be scalable
How do we bridge the cultural and the practical differences?
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
• Carefully designed and engineered software platform is
just the starting point… User
Interface
Security
File system
Metadata
Storage
Application
logic
REST API
Workflow Execution Manager
Continuous Integration Documentation
• Easy to use interface hiding the technical details
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
Solution 1b: Build Portable and Transparent
Pipelines
• Use Galaxy as workflow engine – large
community support
• Retools to address usability, security, and
other limitations
• Version Controlled Pipeline Templates
• Input files, parameters, and workflow are
sent to IRIDA-specific Galaxy for execution
• Results and provenance information are
copied from Galaxy
1. Input
files sent to
Galaxy
3. Results
downloaded
from Galaxy
IRIDA UI/DB
Galaxy
Assembly Tools
Variant Calling Tools
…
REST API
Shared File System
Worker Worker
2. Tools executed
on Galaxy workers
Source: Franklin Bristow
Solution 1c: Start the training NOW!
• Canada’s National Microbiology Laboratory has hosted
genomic workshops for partners and collaborators
• At, PHMRL, we have been conducting workshops to train
technologists and researchers on some common genomic
analysis tools
• IRIDA Project has dedicated funding for hosting workshops in
4Q of 2015 and 2016
• We would like to engage the epidemiologists in the future for
training purpose as well
GAP 2: INFORMATION SHARING IS
INEFFICIENT AND AD-HOC
Many Players in surveillance and outbreak –
ineffective information sharing
Source: M. Taylor, BCCDC
Provincial public
health dept.
National laboratory
Local public
health dept.
Provincial
laboratory
Cases
Physicians Frontline lab
Information
BioinformaticsandAnalyticalCapacities
Many Systems used in Reporting Diseases –
require data re-entry and re-coding
National Ministry of
Health
Provincial public
health dept.
National laboratory
Local public
health dept.
Provincial
laboratory
Cases
Physicians Local laboratory
Fax/Electronic
Fax
Phone/Fax
Electronic/Paper
Electronic/Fax/Phone
Mailing of
Samples/Fax/Eelctroni
c
Source: M. Taylor, BCCDC
Semantic Web
Credit: http://www.cs.rpi.edu/~hendler/
 Semantic web is a suitable technology framework to
organize and share arbitrary datasets
What’s the web?
• World-Wide-Web (WWW) is a platform where
– Information is distributed (CBC for news, Netflix
for Movies, etc.)
– Information is heterogeneous (text, video,
pictures)
– (relevant) Information is linked by hyperlinks
– Often, information is only human readable
– Often, information is incorrect
– Often, information is not attributed
What’s Semantic web?
• Semantic web inherits many of the (good) attributes of
WWW (distributed, open, heterogeneous, and linked)
• It’s designed to be:
– machine readable based on a common language of logic
– Linking information can be automated making data sharing
easier
– Easier to describe granular data
– Errors can be detected based on logical reasoning
– Information can be attributed and can be made to persist
– “Smart Web”
IRIDA uses semantic web technologies to
address information management issues
• Solutions:
– 2a: Localized Instance of federated databases
– 2b: Permission Control – authentication /authorization for
information sharing
– 2c: User role-based display of information
Solution 2a: Local/Cloud Instances and Data
Federation
• Data processing capacity pushed to data generating
labs
• Allow data sharing securely for enhanced analysis
• Eventually cultivating a culture of openness of data
sharing and collaborative development of tools
44
Authorization
Solution 2b: Security
• Local authorization per instance.
• Method-level authorization.
• Object-level authorization.
• Allow secure, fine grained and
flexible information sharing
controlled by data producer
Solution 2c: Role-based Dynamic Display driven
by Ontology
• Ontologies often lack a content management system (CMS)
• An Interface Model Ontology (IFM) can define a CMS for an
ontology
Source: Damion Dooley
IFM Interface View Permissions
Detailed View Restricted View
E.g. User role permissions control visibility and editing of content
Source: Damion Dooley
GAP 3: INFORMATION
REPRESENTATION IS INCONSISTENT
There are at least 74 different ways to
say “female” in ENA database
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383942/
Solution 3a: Use Ontology
• Ontology: a way to describe types of entities
and relations between them
• Why use ontology
– Ontology is flexible and expandable
– Lower levels of expressivity (e.g. controlled vocabulary,
data dictionary) are heavy handed and show low level of
compliance and adoption
– Free text used as an alternative that are not computing
friendly
– Ontology and semantic web technologies may be a
solution
The Utility of Ontologies in Food-borne Investigations
Example:
Correlate PFGE type SSOXAI.0042 cases between 01 Mar 2015- 16 Mar 2015 with
Spinach  Leafy Greens  Produce  High-Risk Food Sources and Symptoms of Nausea
and Fever
Ontologist organizes how terms are related in a tree so one can search for terms at different
levels
Provides great information-resolving power!!
High-Risk Food
Produce Poultry Seafood
Leafy Greens Sprouts Deli Meat Nuggets Fish Shellfish
Source: Emma Griffiths
Many Domains of Knowledge are needed to describe
an outbreak investigation Build On, Work With:
OBI
TypON
NGSOnto
NIAID-GSC-BRC core metadata
MIxS Ontology
NCBI Biosample etc
TRANS – Pathogen Transmission
EPO
Exposure Ontology
Infectious Disease Ontology
CARD, ARO for AMR
USDA Nutrient DB
EFSA Comp. Food Consump. DB
Example gaps to be filled:
Expand food ontology; expand CARD
AMR data with others.
Lab Checklist/Ontology
• Currently finishing a lab/genomics checklist
• Metadata Domains:
– Sample Collection
– Sample Source
– Environmental
– Lab Analytics
– Sequencing Process /QC
– Sequencing Run /QC
– Assembly Process / QC
– Others overlapping with Epi: Demographic / Geographic / etc.
• Starting an epidemiology checklist to be completed this
year
GAP 4: GENOMIC DATA
INTERPRETATION IS COMPLEX AND
TECHNOLOGY IS EVOLVING
Solution 4a: Use of QA/QC in IRIDA
• Software Engineering
– High quality software that meets regulatory guidelines
– Open Source product to ensure “white box” testing
– Ontology driven software development
– Follow proper software development cycle
• Data Quality
– Built-in modules to check for input data quality
– Warnings and Feedbacks during pipeline execution to laboratory technologists
– Use of Ontology to check metadata (non-genomic) data quality
• Analytic Tool Quality
– Utilize validation datasets
– Use of abstract pipeline description – with version control
– Periodic analysis of exceptions and boundary cases to assess tool accuracy
Solution 4b: Generation of validation datasets
To Participate, Contact
Rene Hendriksen
rshe@food.dtu.dk
Or
Errol Strain
Errol.Strain@fda.hhs.gov
http://www.globalmicrobialidentifier.org/Workgroups#work-group-4
NML and BCPHMRL will be
participating in the GMI proficiency
test to compare our genomic
sequencing and analysis protocols
with other labs around the world
Solution 4c: Exploratory tools can access certain
data via REST API securely
58
http://pathogenomics.sfu.ca/islandviewer
IslandViewer
Dhillon and Laird et al. 2015, Nucleic Acids
Research
http://kiwi.cs.dal.ca/GenGIS
Parks et al. 2013, PLoS One
Availability
• Jun 1 2015: IRIDA 1.0 beta Internal Release
– Release to collaborators for installation and full test
• Jul 1 2015: IRIDA 1.0 beta1
– Announce Beta release, download, documentation
available on website – www.irida.ca
• Aug 1 2015: IRIDA 1.0 beta2
– Cloud installer, with documentation
– Additional pipelines as available
– Visualization as available
Acknowledgements
Project Leaders
Fiona Brinkman – SFU
Will Hsiao – PHMRL
Gary Van Domselaar – NML
University of Lisbon
Joᾶo Carriҫo
National Microbiology Laboratory (NML)
Franklin Bristow
Aaron Petkau
Thomas Matthews
Josh Adam
Adam Olson
Tarah Lynch
Shaun Tyler
Philip Mabon
Philip Au
Celine Nadon
Matthew Stuart-Edwards
Morag Graham
Chrystal Berry
Lorelee Tschetter
Aleisha Reimer
Laboratory for Foodborne Zoonoses (LFZ)
Eduardo Taboada
Peter Kruczkiewicz
Chad Laing
Vic Gannon
Matthew Whiteside
Ross Duncan
Steven Mutschall
Simon Fraser University (SFU)
Melanie Courtot
Emma Griffiths
Geoff Winsor
Julie Shay
Matthew Laird
Bhav Dhillon
Raymond Lo
BC Public Health Microbiology &
Reference Laboratory (PHMRL) and BC
Centre for Disease Control (BCCDC)
Judy Isaac-Renton
Patrick Tang
Natalie Prystajecky
Jennifer Gardy
Damion Dooley
Linda Hoang
Kim MacDonald
Yin Chang
Eleni Galanis
Marsha Taylor
Cletus D’Souza
Ana Paccagnella
University of Maryland
Lynn Schriml
Canadian Food Inspection Agency (CFIA)
Burton Blais
Catherine Carrillo
Dominic Lambert
Dalhousie University
Rob Beiko
Alex Keddy
60
McMaster University
Andrew McArthur
Daim Sardar
European Nucleotide Archive
Guy Cochrane
Petra ten Hoopen
Clara Amid
European Food Safety Agency
Leibana Criado Ernesto
Vernazza Francesco
Rizzi Valentina
61
61
IRIDA Annual General Meeting
Winnipeg, April 8-9, 2015

Weitere ähnliche Inhalte

Was ist angesagt?

CV_Timothy_Sanchez_Dec2015
CV_Timothy_Sanchez_Dec2015CV_Timothy_Sanchez_Dec2015
CV_Timothy_Sanchez_Dec2015
Timothy Sanchez
 
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression DatabaseКолкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
bigdatabm
 
Candidate 113701 (srg) senior biologist
Candidate 113701 (srg) senior biologistCandidate 113701 (srg) senior biologist
Candidate 113701 (srg) senior biologist
Jonathan Duckworth
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
Helena Deus
 

Was ist angesagt? (20)

Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
CV_Timothy_Sanchez_Dec2015
CV_Timothy_Sanchez_Dec2015CV_Timothy_Sanchez_Dec2015
CV_Timothy_Sanchez_Dec2015
 
Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression DatabaseКолкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
 
Bioinformatics in medicine
Bioinformatics in medicineBioinformatics in medicine
Bioinformatics in medicine
 
DSRG report 2001
DSRG report 2001DSRG report 2001
DSRG report 2001
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David Amar
 
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
 
Candidate 113701 (srg) senior biologist
Candidate 113701 (srg) senior biologistCandidate 113701 (srg) senior biologist
Candidate 113701 (srg) senior biologist
 
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2
 
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...
 
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
 
Anne Krug - Lush Prize Conference 2014
Anne Krug  - Lush Prize Conference 2014Anne Krug  - Lush Prize Conference 2014
Anne Krug - Lush Prize Conference 2014
 
Ai and biology
Ai and biologyAi and biology
Ai and biology
 

Andere mochten auch

Music video inspiration – ‘Chameleon '
Music video inspiration – ‘Chameleon 'Music video inspiration – ‘Chameleon '
Music video inspiration – ‘Chameleon '
Jasrolit
 
Copyof andrarelyjustillnesscornellfocusquestions
Copyof andrarelyjustillnesscornellfocusquestionsCopyof andrarelyjustillnesscornellfocusquestions
Copyof andrarelyjustillnesscornellfocusquestions
brittlee2098
 
Music video inspiration – ‘chameleon'
Music video inspiration – ‘chameleon'Music video inspiration – ‘chameleon'
Music video inspiration – ‘chameleon'
Jasrolit
 
Mark Caulfield (Genomics England) - Understanding how genomics will transform...
Mark Caulfield (Genomics England) - Understanding how genomics will transform...Mark Caulfield (Genomics England) - Understanding how genomics will transform...
Mark Caulfield (Genomics England) - Understanding how genomics will transform...
NHShcs
 

Andere mochten auch (13)

Music video inspiration – ‘Chameleon '
Music video inspiration – ‘Chameleon 'Music video inspiration – ‘Chameleon '
Music video inspiration – ‘Chameleon '
 
Copyof andrarelyjustillnesscornellfocusquestions
Copyof andrarelyjustillnesscornellfocusquestionsCopyof andrarelyjustillnesscornellfocusquestions
Copyof andrarelyjustillnesscornellfocusquestions
 
Portable Partition Screens
Portable Partition ScreensPortable Partition Screens
Portable Partition Screens
 
CV
CVCV
CV
 
History of music videos
History of music videosHistory of music videos
History of music videos
 
Andrew goodwin’s theory
Andrew goodwin’s theoryAndrew goodwin’s theory
Andrew goodwin’s theory
 
2013-20
2013-202013-20
2013-20
 
Sunny Mobile Call Back App
Sunny Mobile Call Back App Sunny Mobile Call Back App
Sunny Mobile Call Back App
 
Music video inspiration – ‘chameleon'
Music video inspiration – ‘chameleon'Music video inspiration – ‘chameleon'
Music video inspiration – ‘chameleon'
 
Lanybook 2016
Lanybook 2016 Lanybook 2016
Lanybook 2016
 
Catálogo high tech solutions
Catálogo high tech solutionsCatálogo high tech solutions
Catálogo high tech solutions
 
Mark Caulfield (Genomics England) - Understanding how genomics will transform...
Mark Caulfield (Genomics England) - Understanding how genomics will transform...Mark Caulfield (Genomics England) - Understanding how genomics will transform...
Mark Caulfield (Genomics England) - Understanding how genomics will transform...
 
Digging into thousands of variants to find disease genes in Mendelian and com...
Digging into thousands of variants to find disease genes in Mendelian and com...Digging into thousands of variants to find disease genes in Mendelian and com...
Digging into thousands of variants to find disease genes in Mendelian and com...
 

Ähnlich wie Grand round whsiao_may2015

IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology
William Hsiao
 

Ähnlich wie Grand round whsiao_may2015 (20)

IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
 
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
 
2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data
 
IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
 
Nov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_finalNov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_final
 
Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...
 
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Standards for public health genomic epidemiology - Biocuration 2015
Standards for public health genomic epidemiology - Biocuration 2015Standards for public health genomic epidemiology - Biocuration 2015
Standards for public health genomic epidemiology - Biocuration 2015
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
2013-10-23 DTL Next Generation Life Sciences Event, Utrecht
2013-10-23 DTL Next Generation Life Sciences Event, Utrecht2013-10-23 DTL Next Generation Life Sciences Event, Utrecht
2013-10-23 DTL Next Generation Life Sciences Event, Utrecht
 

Mehr von IRIDA_community

Mehr von IRIDA_community (14)

Robertson immemxi final March 2016
Robertson immemxi final March 2016Robertson immemxi final March 2016
Robertson immemxi final March 2016
 
Hetman immem xi final March 2016
Hetman immem xi final March 2016Hetman immem xi final March 2016
Hetman immem xi final March 2016
 
Barker immemxi final March 2016
Barker immemxi final March 2016Barker immemxi final March 2016
Barker immemxi final March 2016
 
Emma FoodON poster3
Emma FoodON poster3Emma FoodON poster3
Emma FoodON poster3
 
Emma Food on workshop allergy_eg
Emma Food on workshop allergy_egEmma Food on workshop allergy_eg
Emma Food on workshop allergy_eg
 
Biocuration gen epio_poster
Biocuration gen epio_posterBiocuration gen epio_poster
Biocuration gen epio_poster
 
Emma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_posterEmma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_poster
 
Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016
 
Integrate Ontologies into your apps
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your apps
 
Report Calc for Quality Control
Report Calc for Quality ControlReport Calc for Quality Control
Report Calc for Quality Control
 
Irida immemxi hsiao
Irida immemxi hsiaoIrida immemxi hsiao
Irida immemxi hsiao
 
Gen epio immem_griffiths
Gen epio immem_griffithsGen epio immem_griffiths
Gen epio immem_griffiths
 
Irida bccdc dec10_2015
Irida bccdc dec10_2015Irida bccdc dec10_2015
Irida bccdc dec10_2015
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
 

Kürzlich hochgeladen

Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
AlinaDevecerski
 

Kürzlich hochgeladen (20)

VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
 
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
 
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomLucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
Call Girls Siliguri Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Bangalore Call Girls Nelamangala Number 9332606886 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 9332606886  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 9332606886  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 9332606886 Meetin With Bangalore Esc...
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
 
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 

Grand round whsiao_may2015

  • 1. How Can We Make Genomic Epidemiology a Widespread Reality? William Hsiao, Ph.D. William.hsiao@bccdc.ca @wlhsiao BC Public Health Microbiology and Reference Laboratory BCCDC Grand Round May 26 2015
  • 2. Outline • Part 1: What is genomic epidemiology and Why is it important for public health microbiology • Part 2: What are the requirements to bring genomic epidemiology to routine public health practice – Introducing our project IRIDA as part of the solution
  • 3. 3 Source: Peter Gleick, Scienceblogs.com
  • 7. Molecular Epidemiology • Laboratory generated biomarker results can be correlated to epidemiological investigations (People, Place, Time) • Provides linkage based on common exposure to the same pathogen at the molecular level • Most tests detect one or a few of specific biomarkers, representing a fraction of the pathogens’ genetic information
  • 8. Current Methods of Characterizing Foodborne Pathogens in a Public Health Laboratory • Growth characteristics • Phenotypic panels • Agglutination reactions • Enzyme immuno assays (EIAs) • PCR • DNA arrays (hybridization) • Sanger sequencing of marker genes • DNA restriction • Electrophoresis (PFGE, capillary) Each pathogen is characterized by methods that are specific to that pathogen in multiple workflows (separate workflows for each pathogen) TAT: 5 min – weeks (months) Source: Rebecca Lindsey
  • 9. Genomic Epidemiology Def: Using whole genome sequencing data from pathogens and epidemiological investigations to track spread of an infectious disease
  • 10. Why Genomic Epidemiology • One technology (DNA sequencing) compatible with many types of pathogens • Capable of generating 10-1000s of high quality pathogen genomes within 1-7 days
  • 11. Sequencing = lots of HQ Data • Capture the pathogen’s entire genetic makeup • Unbiased (~97-99+% of the genome captured using common sequencing approaches) • Significantly more data than traditional methods • Allow higher resolution and higher sensitivity methods to be applied • Allow value-added evolutionary & Functional study of the pathogens – Virulence factors – AMR genes
  • 12. $10K per human genome or $10 per bacterial genome $100M per human genome Sequencing cost continues to drop
  • 13. Variations in genomes = Basis of Comparison • Mutations – Point mutations – Small insertions and deletion (indels) – Can change functions of a gene • Recombination, deletion, and duplication – Rearrange genes, can change expression – Increase gene copy number – Delete genes • Horizontal gene transfer – Acquiring genetic material from non-parental organism • E.g. Antibiotic resistance / new toxins
  • 14. SNP Analysis • What is a SNP? – A SNP (single nucleotide polymorphism) is DNA sequence variation occurring when a single nucleotide differs between two or more genomes ATCGCGATATCATACGG ATCGCAATATCATACGG ATCGCGATATCATACGG ATCGCGATATCATACGG ATCGCAATATCATACGG • SNP can be created from point mutation but can also be created from insertion and deletion of one nucleotide
  • 15. Why are SNPs useful • Silent mutations that do not change protein sequences happen quite frequently due to DNA replication errors => High Resolution • SNPs occurs across the whole genome and can be detected from whole genome sequencing => Unbiased markers • SNPs can also be used to infer phylogeny of organisms – More shared SNPs = more closely related
  • 16. SNP Minimal Spanning Tree – colored by Phage Type PT8 PT4 PT13a PT52 The most similar isolates are connected first => clustering them together
  • 17. SNP Minimal Spanning Tree – colored by outbreaks
  • 18. Many phylogenetic trees based on SNPs published to show clustering of outbreak cases den Bakker et al Emerg Infect Dis. 2014 Aug;20(8) Non-related cases Outbreak cases Allard, M et alPLoS ONE 8 (1) 2013
  • 19. Forces Driving Pathogen Genome Evolution Specialization “lean and mean” New function can be derived through: Gene expression and be turned on and off
  • 20. Intra-cluster distances overlap with inter-cluster distances Leekitcharoenphon, et al. 2014. PLoS ONE 9 (2). doi:10.1371/journal.pone.0087991.
  • 21. Different species have different clustering distances Leekitcharoenphon, et al. 2014. PLoS ONE 9 (2). doi:10.1371/journal.pone.0087991.
  • 22. Genomics + Epidemiology • Having genetic distance information alone may not be enough to fully characterize outbreaks • Need to combine with epidemiological investigations • Using known clusters to establish (sub- )species-specific genetic distance criteria • Genomics can help connecting previous unlinked cases to uncover new cases
  • 23. Each year, one in eight Canadians (or four million people) get sick with a domestically acquired food-borne illness. http://www.phac-aspc.gc.ca/efwd-emoha/efbi-emoa-eng.php
  • 24. Whole Genome Sequencing of Foodborne Pathogens Around the World • UK Public Health England committed to sequence all the Salmonella isolates submitted to PH Lab • US FDA and CDC (supported by National Center for Biotechnology Information) created a distributed network of labs to utilize WGS for pathogen identification https://publichealthmatters.blog.gov.uk/2014/01/20/innovations-in-genomic-sequencing/ http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/ucm363134.htm
  • 25. Genome Canada Bioinformatics Competition: Large-Scale Project “A Federated Bioinformatics Platform for Public Health Microbial Genomics” Our Goal The IRIDA platform (Integrated Rapid Infectious Disease Analysis) An open source, standards compliant, high quality genomic epidemiology analysis platform based on web-technology to support real-time (food- borne) disease outbreak investigations 25 www.IRIDA.ca
  • 26. Partnership among public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and application to real-life and real- time use cases in public health agencies - Project Team has direct access to state of the art research in academia - Project Team is directly embedded in user organization National Public Health Agency Provincial Public Health Agency Academic/Public
  • 27. IRIDA Project Phases • Phase 1: genomics process and analysis pipeline to produce categorical data (MLST and SNPs) suitable for current epidemiological analysis – almost completed • Phase 2: combine the categorical data with epidemiological data (line list approach to replace current Excel based approach) – in progress • Phase 3: Develop IRIDA as an exploratory platform for new ways of interpreting genomics data in light of epidemiological and clinical data – in progress; continuous process beyond current project
  • 28. Interviews with key personnel to identify barriers to implement genomic epidemiology in public health agencies 28
  • 29. GAP 1: PUBLIC HEALTH PERSONNEL LACK TRAINING IN GENOMICS
  • 30. Microbial genomics has been a valuable research tool • Help us understand: – microbial evolution – pathogenesis – create novel industrial processes – create new laboratory tests • Use historical isolates – not real time • Use of laboratory strains – no associated rich clinical and epidemiological metadata
  • 31. Cultural and Practical Differences Genomics Research Laboratory Genomics Diagnostic Laboratory Curiosity driven Production / Case driven Exploratory analysis tolerated Exploratory analysis discouraged Reproducibility = other labs’ problem Reproducibility critical Tweaking protocols desirable Stability in protocols desirable Protocols don’t need to be validated Protocols need to be validated Novelty justifies the high cost of experiment Conscious of cost per unit test; tests need to be scalable How do we bridge the cultural and the practical differences?
  • 32. Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data • Carefully designed and engineered software platform is just the starting point… User Interface Security File system Metadata Storage Application logic REST API Workflow Execution Manager Continuous Integration Documentation
  • 33. • Easy to use interface hiding the technical details Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
  • 34. Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
  • 35. Solution 1b: Build Portable and Transparent Pipelines • Use Galaxy as workflow engine – large community support • Retools to address usability, security, and other limitations • Version Controlled Pipeline Templates • Input files, parameters, and workflow are sent to IRIDA-specific Galaxy for execution • Results and provenance information are copied from Galaxy 1. Input files sent to Galaxy 3. Results downloaded from Galaxy IRIDA UI/DB Galaxy Assembly Tools Variant Calling Tools … REST API Shared File System Worker Worker 2. Tools executed on Galaxy workers Source: Franklin Bristow
  • 36. Solution 1c: Start the training NOW! • Canada’s National Microbiology Laboratory has hosted genomic workshops for partners and collaborators • At, PHMRL, we have been conducting workshops to train technologists and researchers on some common genomic analysis tools • IRIDA Project has dedicated funding for hosting workshops in 4Q of 2015 and 2016 • We would like to engage the epidemiologists in the future for training purpose as well
  • 37. GAP 2: INFORMATION SHARING IS INEFFICIENT AND AD-HOC
  • 38. Many Players in surveillance and outbreak – ineffective information sharing Source: M. Taylor, BCCDC Provincial public health dept. National laboratory Local public health dept. Provincial laboratory Cases Physicians Frontline lab Information BioinformaticsandAnalyticalCapacities
  • 39. Many Systems used in Reporting Diseases – require data re-entry and re-coding National Ministry of Health Provincial public health dept. National laboratory Local public health dept. Provincial laboratory Cases Physicians Local laboratory Fax/Electronic Fax Phone/Fax Electronic/Paper Electronic/Fax/Phone Mailing of Samples/Fax/Eelctroni c Source: M. Taylor, BCCDC
  • 40. Semantic Web Credit: http://www.cs.rpi.edu/~hendler/  Semantic web is a suitable technology framework to organize and share arbitrary datasets
  • 41. What’s the web? • World-Wide-Web (WWW) is a platform where – Information is distributed (CBC for news, Netflix for Movies, etc.) – Information is heterogeneous (text, video, pictures) – (relevant) Information is linked by hyperlinks – Often, information is only human readable – Often, information is incorrect – Often, information is not attributed
  • 42. What’s Semantic web? • Semantic web inherits many of the (good) attributes of WWW (distributed, open, heterogeneous, and linked) • It’s designed to be: – machine readable based on a common language of logic – Linking information can be automated making data sharing easier – Easier to describe granular data – Errors can be detected based on logical reasoning – Information can be attributed and can be made to persist – “Smart Web”
  • 43. IRIDA uses semantic web technologies to address information management issues • Solutions: – 2a: Localized Instance of federated databases – 2b: Permission Control – authentication /authorization for information sharing – 2c: User role-based display of information
  • 44. Solution 2a: Local/Cloud Instances and Data Federation • Data processing capacity pushed to data generating labs • Allow data sharing securely for enhanced analysis • Eventually cultivating a culture of openness of data sharing and collaborative development of tools 44
  • 45. Authorization Solution 2b: Security • Local authorization per instance. • Method-level authorization. • Object-level authorization. • Allow secure, fine grained and flexible information sharing controlled by data producer
  • 46. Solution 2c: Role-based Dynamic Display driven by Ontology • Ontologies often lack a content management system (CMS) • An Interface Model Ontology (IFM) can define a CMS for an ontology Source: Damion Dooley
  • 47.
  • 48. IFM Interface View Permissions Detailed View Restricted View E.g. User role permissions control visibility and editing of content Source: Damion Dooley
  • 50. There are at least 74 different ways to say “female” in ENA database http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383942/
  • 51. Solution 3a: Use Ontology • Ontology: a way to describe types of entities and relations between them • Why use ontology – Ontology is flexible and expandable – Lower levels of expressivity (e.g. controlled vocabulary, data dictionary) are heavy handed and show low level of compliance and adoption – Free text used as an alternative that are not computing friendly – Ontology and semantic web technologies may be a solution
  • 52. The Utility of Ontologies in Food-borne Investigations Example: Correlate PFGE type SSOXAI.0042 cases between 01 Mar 2015- 16 Mar 2015 with Spinach  Leafy Greens  Produce  High-Risk Food Sources and Symptoms of Nausea and Fever Ontologist organizes how terms are related in a tree so one can search for terms at different levels Provides great information-resolving power!! High-Risk Food Produce Poultry Seafood Leafy Greens Sprouts Deli Meat Nuggets Fish Shellfish Source: Emma Griffiths
  • 53. Many Domains of Knowledge are needed to describe an outbreak investigation Build On, Work With: OBI TypON NGSOnto NIAID-GSC-BRC core metadata MIxS Ontology NCBI Biosample etc TRANS – Pathogen Transmission EPO Exposure Ontology Infectious Disease Ontology CARD, ARO for AMR USDA Nutrient DB EFSA Comp. Food Consump. DB Example gaps to be filled: Expand food ontology; expand CARD AMR data with others.
  • 54. Lab Checklist/Ontology • Currently finishing a lab/genomics checklist • Metadata Domains: – Sample Collection – Sample Source – Environmental – Lab Analytics – Sequencing Process /QC – Sequencing Run /QC – Assembly Process / QC – Others overlapping with Epi: Demographic / Geographic / etc. • Starting an epidemiology checklist to be completed this year
  • 55. GAP 4: GENOMIC DATA INTERPRETATION IS COMPLEX AND TECHNOLOGY IS EVOLVING
  • 56. Solution 4a: Use of QA/QC in IRIDA • Software Engineering – High quality software that meets regulatory guidelines – Open Source product to ensure “white box” testing – Ontology driven software development – Follow proper software development cycle • Data Quality – Built-in modules to check for input data quality – Warnings and Feedbacks during pipeline execution to laboratory technologists – Use of Ontology to check metadata (non-genomic) data quality • Analytic Tool Quality – Utilize validation datasets – Use of abstract pipeline description – with version control – Periodic analysis of exceptions and boundary cases to assess tool accuracy
  • 57. Solution 4b: Generation of validation datasets To Participate, Contact Rene Hendriksen rshe@food.dtu.dk Or Errol Strain Errol.Strain@fda.hhs.gov http://www.globalmicrobialidentifier.org/Workgroups#work-group-4 NML and BCPHMRL will be participating in the GMI proficiency test to compare our genomic sequencing and analysis protocols with other labs around the world
  • 58. Solution 4c: Exploratory tools can access certain data via REST API securely 58 http://pathogenomics.sfu.ca/islandviewer IslandViewer Dhillon and Laird et al. 2015, Nucleic Acids Research http://kiwi.cs.dal.ca/GenGIS Parks et al. 2013, PLoS One
  • 59. Availability • Jun 1 2015: IRIDA 1.0 beta Internal Release – Release to collaborators for installation and full test • Jul 1 2015: IRIDA 1.0 beta1 – Announce Beta release, download, documentation available on website – www.irida.ca • Aug 1 2015: IRIDA 1.0 beta2 – Cloud installer, with documentation – Additional pipelines as available – Visualization as available
  • 60. Acknowledgements Project Leaders Fiona Brinkman – SFU Will Hsiao – PHMRL Gary Van Domselaar – NML University of Lisbon Joᾶo Carriҫo National Microbiology Laboratory (NML) Franklin Bristow Aaron Petkau Thomas Matthews Josh Adam Adam Olson Tarah Lynch Shaun Tyler Philip Mabon Philip Au Celine Nadon Matthew Stuart-Edwards Morag Graham Chrystal Berry Lorelee Tschetter Aleisha Reimer Laboratory for Foodborne Zoonoses (LFZ) Eduardo Taboada Peter Kruczkiewicz Chad Laing Vic Gannon Matthew Whiteside Ross Duncan Steven Mutschall Simon Fraser University (SFU) Melanie Courtot Emma Griffiths Geoff Winsor Julie Shay Matthew Laird Bhav Dhillon Raymond Lo BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC) Judy Isaac-Renton Patrick Tang Natalie Prystajecky Jennifer Gardy Damion Dooley Linda Hoang Kim MacDonald Yin Chang Eleni Galanis Marsha Taylor Cletus D’Souza Ana Paccagnella University of Maryland Lynn Schriml Canadian Food Inspection Agency (CFIA) Burton Blais Catherine Carrillo Dominic Lambert Dalhousie University Rob Beiko Alex Keddy 60 McMaster University Andrew McArthur Daim Sardar European Nucleotide Archive Guy Cochrane Petra ten Hoopen Clara Amid European Food Safety Agency Leibana Criado Ernesto Vernazza Francesco Rizzi Valentina
  • 61. 61 61 IRIDA Annual General Meeting Winnipeg, April 8-9, 2015

Hinweis der Redaktion

  1. Today, I’d like to tell you a bit about some of Canada’s effort on building a genomic epidemiology analysis platform
  2. This is John Snow’s famous map. On it, I’ve colored in red his column of bars, each of which represents a cholera death. I’ve also circled in blue the local water pumps, including the Broad Street pump — servicing the well that was the source of cholera. In a now legendary experiment in 1854, Dr. John Snow, a London physician, conducted a simple yet brilliant test that helped to settle the debate about the transmission of cholera. Snow drew a map [see Figure 2 below] of a virulent cholera outbreak in one of the poorest neighborhoods of London – served by central wells and no sewage collection. He plotted the homes and numbers of people affected, and in a flash of insight, mapped the location of the wells that provided water for the hardest hit neighborhoods. The maps he generated and the interviews he conducted with the families of victims convinced him that the source of contamination was the water from the Broad Street well. He received permission from local authorities to remove the pump, which forced residents to go to other, uncontaminated wells for water. Within days, the outbreak subsided.” [from “Bottled and Sold: The Story Behind Our Obsession with Bottled Water” Island Press, Washington DC.]
  3. Having the ability to identify cluster of cases in the population is critical to allow us to understand what is happening, and track the cause of the outbreak
  4. We can differentiate strains of organisms in the population and can tell us who carries the same pathogen. This is achievable via lab techniques which aim at subtyping pathogens
  5. If we can overlay additional information, such as exposure – did they eat in the same restaurant? – we can then track the source of the outbreak.
  6. Add source
  7. In terms of cost
  8. Despite our high standard in food safety, each year 1 in eight Canadian get food poisoning, costing the economy $4 billion dollars. It is important to track the source and spread of the disease to prevent further sickness
  9. IRIDA was conceived about 2 years ago through a Genome Canada Bioinformatics Grant. It is an effort to build an open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time disease outbreak investigations, initially focused on food-borne illnesses
  10. IRIDA is partnership among provincial public health agencies, national public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and real-life and real-time use cases in public health agencies Project Team has direct access to state of the art research in academia Project Team is directly embedded in user organization
  11. Since we have access to the end users, we conducted interviews with these subject experts to identify what are the barriers for up-taking of genomics epidemiology in public health agencies. We interviewed epidemiologists, lab scientists and technologists, medical microbiologists and lab administrators. So for the rest of the presentation, I’ll talk about some of the gaps we identified and how IRIDA can meet the requirements.
  12. The first gap which should not be a surprise to this audience, is that public health workers are mostly unfamiliar with genomics and the bioinformatics analysis needed to process and interpret genomic data
  13. While we do believe in the long run, adequate training in genomics is needed to bridge this gap in the short term having high quality analysis platform to automate data processing and has consistent analysis protocols will help to ease the transition. However, carefully designed and engineered software platform is just the starting point and there will no doubt be many similar platforms to choose from. So I will touch on some of the more interesting design philosophies we have for IRIDA.
  14. We found that in the diagnostic testing world, complex procedures with lots of options lead to more human errors and more non-compliance. So, one design solution that we stress on is to have a simple user interface that hides the technical details. This solution of course can’t stand on its own and I’ll describe measures to ensure that flexibility and scientific rigors can be maintained
  15. We think a user interface should be like a joke… If you have to explain it , then it’s not good. That said, we do have extensive documentations for the administrators and accreditation auditors who don’t like jokes :P
  16. Next solution is to leverage Galaxy which has a large community support and user base as our pipeline engine. We had to retool Galaxy extensively to address usability, security and other limitations. To achieve this we build IRIDA platform on top of the Galaxy engine where input files, parameters and workflows are sent to Galaxy for execution and results and pipeline provenance information are copied back into the IRIDA database for storage and archive
  17. To address the knowledge gaps in genomics, we have started training our public health lab workers on genomic analysis. We would like to hear about other training initiatives and will be happy to share our experience and training material
  18. The second gap that we identified is that sharing of information within and between organizations are highly inefficient and often involves sharing of Excel files with deleted columns to hide sensitive information
  19. There are many players involved in infectious disease surveillance and outbreak investigation. However, concerns with privacy and confidentiality (both founded and unfounded) means that information tend to be aggregated and lost as we move from the frontline labs to public health and reference labs. However the bioinformatics and analytical capacities are the most abundant in central labs and academia
  20. Moreover, different institutions have different software and often data is exported and printed, faxed, then re-imported to a new system by re-typing! This is a huge waste of time and source of errors
  21. IRIDA has a few designs to deal with these issues, and I’ll highlight 3 here.
  22. First we propose that we should push the data processing capacity to the periphery where data is the richest by encouraging local or private cloud instances of the IRIDA platform. This way our partners would not be obligated to give up their data. The different instances are connected via a federated database schema. Data can then be shared securely and easily to allow enhanced analysis to be done by genomic experts located centrally. The more we share successfully, the more likely people will realize the benefit in sharing and this can lead to a new culture of openness
  23. Second we have built-in mechanisms for authentication and authorization at different levels to allow secure and fine grained information sharing. This would allow parties to customize the data they share per material and data transfer agreements
  24. Third, we realized we need to have a flexible user interface to present the data. Therefore, we are in the process of developing an interface model ontology which defines a content management system.
  25. As an example, based on the user’s role, they will be able to see the content of the database displayed differently.
  26. The third gap we identified is that information representation is inconsistent across organizations
  27. Given the richness and complexity of genomic epidemiological data, we opt to use and develop ontologies compliant with OBO Foundry to describe the data; Currently, lower levels of expressivity such as controlled vocabularies and data dictionaries are used but they tend to be heavy handed and show low level of compliance and adoption. We believe ontology and semantic web technologies can make data sharing across heterogeneous systems and platforms more tractable.
  28. There are many domains of knowledge needed to describe an outbreak investigation and we strive to re-use existing standards as much as possible
  29. Currently we are finishing a lab/genomic checklist and will be starting an epidemiology checklist soon
  30. Lastly, Jon and others mentioned yesterday, genomic data interpretation is complex and the technology is still evolving, yet in the world of diagnostic lab, accreditation means standardized protocols need to be developed
  31. So we focus quite a bit of our energy on developing high quality software with build-in QA and QC components to assess data quality and analytic tool performance
  32. I also want to highlight GMI’s WG4’s effort in developing proficiency tests for wet lab and analysis pipelines. To participate you can contact Rene or Errol.
  33. To facilitate tool improvement and to allow exploratory analysis not part of IRIDA pipelines to be done, we would also allow pre-authorized tools to connect to IRIDA via a REST API securely. Currently we have two external tools for genomic island detection and phylogeography analysis.
  34. The software will be released to a few international collaborators for full testing by Jun 1. Then in Jul 1, we plan to release the beta version publicly so people can try it out. Of course the software will be free and we would love to collaborate with people on both the software and the ontology development.
  35. Large Group of People who contributed to this work
  36. We also have a wonderful group of advisors