2015 06-12-beiko-irida-big data

“All of your answers are approximate,
you might as well live with it…”
2
Andrew Rau-Chaplin, 1½ hours ago

Integrated Rapid Infectious Disease Analysis
www.irida.ca
Rob Beiko
Faculty of Computer Science
Dalhousie University
June 12
Microbial genomics
for rapid investigation
of infectious disease
Image © Kenneth Todar

7
Influenza A
RNA genome (14,000 nucleotides)
Eight segments
(Image: Tao and Zheng, Science 2012)
S. Typhi CT18
DNA genome (~5,100,000 nucleotides)
One chromosome + two plasmids
Science (2001)
VIRUS BACTERIUM

8
Outbreak
investigation
Similarities: place, time, genetics
fda.gov
2014
2010-2013
Inns et al. (2015)

Outbreak investigation in Canada
9
NATIONAL MICROBIOLOGY LABORATORY
PROVINCIAL PUBLIC
HEALTH LABORATORIES
CLINICAL ISOLATES
SENTINEL SURVEILLANCE
(FoodNet Canada)
CLINICAL, FOOD,
ENVIRONMENTAL
CANADIAN FOOD
INSPECTION AGENCY
(Regulatory)
FOOD ISOLATES
LISTERIA - E. COLI O157:H7 - SALMONELLA - SHIGELLA
PFGE/MLVA
PUBLIC HEALTH ACTION

10
Pulsed Field Gel Electrophoresis
Serratia - NICU
Jang et al., J Hosp Infect (2001)

11
15 gigabases per run
$1000 - $1500 / run, 1 day
Tinier pieces (150 – 400 bases)
< 1 kilobase per run
$2 / run, 1-3 hours (96 in parallel)
Tiny pieces (600 – 1000 bases)
2011: Illumina MiSeq1977: Sanger sequencing ( )
DNA Sequencing

MiSeq projects at Dalhousie
• Bedford Basin microbial monitoring
• Pediatric Crohn’s disease samples
• Global microbial air sampling
• Mink genomes
• Sequencing Lactobacillus genomes from the poop of
old mice
• Wastewater diversity and function in the Arctic
• Verifying ingredients in dog food ( )
• Exercise and the Microbiome
13

Integrated Rapid Infectious Disease Analysis
www.irida.ca
14
 1.56M, 3-year Genome Canada Large-Scale Applied
Platform Grant
 SFU / BCCDC / PHAC-NML / Dalhousie
 DNA sequencing and downstream applications
• data management / federation
• analysis workflows
• ontologies
• APIs
• 3rd-party applications
 Implementation in provincial public health labs
 Training

16
 Ontologies and data standards
 NCBI, MiXS, vegetables
 Metadata
 Data provenance
 Data quality
 Environmental information

Data sharing!
• BIG challenges – different jurisdictions,
“ownership” of epi data. Privacy!
• Health service providers – concerns about
privacy and data breach
• Technology outstrips policy
• What digital records could we get TODAY?
• Canada lagging in data sharing
17

18
 Calling isolates based on
genetic variation
 Traditional:
 Pulsed-field
 Multi-locus (standards! mlst.net)
 Whole genomes:
 Lots of information!
 Too much information!
 Lots of filtering and quality
control required

19
 Workflow management
 REST-like API (3rd – party
applications)
 Security: authentication /
authorization
 Data models &
implementation

Local Storage
Remote APIs
IRIDA’s Federated Design
List Samples
20

21
 Each pipeline is implemented
as a Galaxy workflow
 Internal analysis pipelines
 Assembly and annotation
 Phylogenetics
 “Line list” management
 3rd-party applications

22
Sampled genomes Quality control Tree generation /
visualization
Single-Nucleotide Variant Phylogenetic Pipeline
(SNVPhyl)

23
GenGIS
Data from Haiti cholera outbreak, 2010
http://kiwi.cs.dal.ca/GenGIS

IslandViewer
24
http://www.pathogenomics.sfu.ca/islandviewer/browse

25
 Interfaces / environment
 Personas
 Researchers
 Epidemiologists
 Clinical microbiologists / lab technicians
 Workflow design and
execution

Full Privileges
Cluster
Line
List ID
Patient
Name
Prov.
Health
No.
Age Sex Location
Sample
ID
Collection
Date
Culture
Result
A 1
John
Smith
4513253244 26 M Vancouver F14231 14/03/21
Salmonella
sp.
A 2
Sally
Smith
4519567458 24 F Vancouver F14235 14/03/21
Salmonella
sp.
B 3
Tom
Jones
4517543216 35 M Vancouver M6542 14/03/24
Salmonella
sp.
B 4
Helen
Jones
9856321124 35 F Vancouver S1245 14/03/22
Salmonella
sp.
C 5
Jennifer
Lee
4516853122 29 F Vancouver S5642 14/03/22
Salmonella
sp.
C 6
Michael
Brown
9456534561 45 M Victoria T68954 14/03/25
Salmonella
sp.
Phylogenetic
Tree
Genetic Distance

Limited Privileges
Cluster
Line
List ID
Patient
Name
Prov.
Health
No.
Age Sex Location
Sample
ID
Collection
Date
Culture
Result
A 1
John
Smith
4513253244 26 M Vancouver F14231 14/03/21
Salmonella
sp.
A 2
Sally
Smith
4519567458 24 F Vancouver F14235 14/03/21
Salmonella
sp.
B 3
Tom
Jones
4517543216 35 M Vancouver M6542 14/03/24
Salmonella
sp.
B 4
Helen
Jones
9856321124 35 F Vancouver S1245 14/03/22
Salmonella
sp.
C 5
Jennifer
Lee
4516853122 29 F Vancouver S5642 14/03/22
Salmonella
sp.
C 6
Michael
Brown
9456534561 45 M Victoria T68954 14/03/25
Salmonella
sp.
Phylogenetic
Tree
Genetic Distance

Large-scale sequencing initiatives
28
en.wikipedia.org

FDA GenomeTrakr
29
http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/ucm363134.htm

Public Health England project
(>10,000 Salmonella so far)
• As of 2015, sequencing every sampled Salmonella
isolate collected in England
• Over 10,000 sequenced to date
• 8000 already available for download in the public
databases
30

Gary van Domselaar, NML
31
The Global Microbial Identifier

32
What’s next?
??? per run
$900 / run, 6 hours
Huge pieces (max so far – 200-300 kilobases)
Can stop / restart using same disposable flowcell
2015: Oxford Nanopore MinION
15 cm (-ish)
thehightechsociety.com

Quick et al. (2015)
“Using a novel streaming phylogenetic
placement method samples can be
assigned to a serotype in 40 minutes and
determined to be part of the outbreak in less
than 2 h.”
33

Ebola monitoring
34
blogs.biomedcentral.com
Joshua Quick, Nick Loman

Example workflow
35
6 hrs
Change
flowcell
Samples evaluated against reference in real time
Positive ID /
placement
Load DNA
    

Challenges
• Sample extraction: getting DNA from stuff
• Clinical-grade evaluation
• Training
• Equipment reliability
• Sequencing errors
• Quality of reference data / attribution algorithms
• Database updates in real time
• Ethics / privacy (Genomes Sequenced While U Wait)
36

The Point
37
Comprehensive monitoring
Accurate typing
Rapid identification
Real-time decision making

Acknowledgements
PIs
Fiona Brinkman – SFU
Will Hsiao – PHMRL
Gary Van Domselaar – NML
Morag Graham - NML
Rob Beiko – Dalhousie
University of Lisbon
Joᾶo Carriҫo
National Microbiology Laboratory (NML)
Franklin Bristow
Aaron Petkau
Thomas Matthews
Josh Adam
Adam Olsen
Tara Lynch
Shaun Tyler
Philip Mabon
Philip Au
Celine Nadon
Matthew Stuart-Edwards
Chrystal Berry
Lorelee Tschetter
Laboratory for Foodborne Zoonoses (LFZ)
Eduardo Taboada
Peter Kruczkiewicz
Chad Laing
Vic Gannon
Matthew Whiteside
Ross Duncan
Steven Mutschall
Simon Fraser University (SFU)
Melanie Courtot
Emma Griffiths
Geoff Winsor
Julie Shay
Matthew Laird
Bhav Dhillon
Raymond Lo
BC Public Health Microbiology &
Reference Laboratory (PHMRL) and BC
Centre for Disease Control (BCCDC)
Judy Isaac-Renton
Patrick Tang
Natalie Prystajecky
Jennifer Gardy
Damion Dooley
Linda Hoang
Kim MacDonald
Yin Chang
Eleni Galanis
Marsha Taylor
Cletus D’Souza
Ana Paccagnella
University of Maryland
Lynn Schriml
Canadian Food Inspection Agency (CFIA)
Burton Blais
Catherine Carrillo
Dominic Lambert
Dalhousie University
Alex Keddy 38
McMaster University
Andrew McArthur
Daim Sardar
European Nucleotide Archive
Guy Cochrane
Petra ten Hoopen
Clara Amid
European Food Safety Agency
Leibana Criado Ernesto
Vernazza Francesco
Rizzi Valentina

39
Seminar from the Will Hsiao,
BC Centres for Disease Control

40
Materials to be available on
http://bioinformatics.ca/
June 24-26, 2015

The Bioinformatics Exam of the Future
41
tagc.com.au
commons.wikimedia.org/wiki/File:DNA_ahelatest_moodustunud_niit_katsuti_korgil..JPG
http://omicfrontiers.com/2014/06/11/diaryofaminion_part2/

2009 was a long time ago
42
J. Craig Venter Institute

43Photo credit: Emma Allen-Vercoe
Some slides courtesy of Gary Van Domselaar, NML

2015 06-12-beiko-irida-big data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie 2015 06-12-beiko-irida-big data

Ähnlich wie 2015 06-12-beiko-irida-big data (20)

Mehr von beiko

Mehr von beiko (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

2015 06-12-beiko-irida-big data

Hinweis der Redaktion