This document provides information about using whole genome sequencing (WGS) for microbial typing and epidemiology. It discusses using WGS for high-resolution strain discrimination and detection of antibiotic resistance and virulence genes. The ideal scenario is a method that can recover all current sequence-based typing information from a single experimental procedure. The document outlines various bioinformatics tools and approaches for WGS analysis including assembly, mapping, annotation, comparison and specialized databases. It emphasizes choosing analysis based on research questions. Gene-by-gene approaches are favored for their ability to classify strains while accounting for recombination. The document lists collaborators and proposes topics for a scientific program on genome-based microbial epidemiology.
3. http://en.wikipedia.org/wiki/File:ChronicleOfADeathForetold.JPG
ďĄ WGS in molecular typing:
ď§ Gene-by-gene: wgMLST,
cgMLST,rMLST,MLST,eMLST,
MLST+
ď§ SNP comparison approaches:
comparison with reference strains
ďĄ Ability to recover most of the
present sequence based typing
information in a single
experimental procedure
4. Microbiological
Sample
The Ideal Scenario
Magic Box of
NGS Wonders for
Microbiology
Completely characterized strain:
⢠Antibiotic resistance profile
⢠Multilocus SequenceTyping (MLST)
⢠Virulence factors present
⢠Other SBTM information .Ex:
⢠spa (S. aureus)
⢠emm (Group A Streptococcus)
Desired End result:
Risk Assessment of the strain and
Useful application of the data to clinical practice
Comparison between groups of strains
6. My Goals/ Areas that I want to apply WGS to:
⢠Microbial population structure
⢠Microbial Evolution
⢠Microbial Genomics : gene structure, genome synteny,
Mobile Genetic Elements detection
My toolbox is chosen based on my questions and what I want to do !
Trying to avoid:
âI suppose it is tempting, if the only tool you have is a hammer, to treat
everything as if it were a nail.â - Abraham H. Maslow (1962),Toward a
Psychology of Being
10. - Perform the same analysis over tens, hundreds or thousands
of strains : your own and publicly available
- Integrate multiple analysis in a single pipeline
- Pipelines = reproducibility (if not something is very wrong)
http://www.ebi.ac.uk/ena
http://www.ncbi.nlm.nih.gov/sra
11. ďĄ Gene-by-Gene /extended MLST approaches
are my favorite
ďĄ Why?
ď§ Allele based classification âbuffersâ the effect of
recombination in the analysis
ď§ Stable nomenclature for alleles facilitates data
exchange by schema creation
ď§ Easy to expand and visualize up to thousands of
genomes with MST- like approaches
ď§ Lower computing requirements
12. ďĄ Bacterial Isolate Genome Sequence Database
ď§ Jolley & Maiden 2010, BMC Bioinformatics 11:595 -
http://pubmlst.org/software/database/bigsdb/
ď§ PROs: Freely available, open-source, handles thousands of genomes, has
several schemas implemented for MLSTfor several bacterial species, and
some extended MLST and core genome MLST (mainly Neisseria sp. but
soon to be expanded)
ď§ CONs: Requires Perl knowledge to install and maintain
ďĄ Ridom SeqSphere+
ď§ http://www.ridom.com/seqsphere/
ď§ Commercial software with client server solutions from assembly to allele
calling and visualization for core genome MLST (MLST+/ cgMLST)
ďĄ Applied Maths - Bionumerics 7.5
ď§ http://www.applied-maths.com/news/bionumerics-version-75-released
ď§ Commercial software with client server solutions from assembly to allele
calling and visualization for whole genome MLST (wgMLST)
13. Schema = set of loci to be used
What is a locus?
gene or part of a gene
How to choose the locus:
1. Start from reference genomes
2. Decide if you want core genes only or core+accessory genes
3. Use a method to compare CDS/ORF of reference genomes:
1. OrthoMCL - www.orthomcl.org
2. CD-HIT-cd-hit.org
4. Parse the output to:
1. Remove paralogous genes
2. Decide which are core genes and which are accessory genes
14. At this point different algorithms/software use:
- BLAST(n/p/x)
- Different criteria and parameters are used to call an
alleles as a coding sequence or part of a coding sequence
15. Self BLAST
â Calculate BSR
BLAST
Run prodigal
on genome
Translate CDS
to protein
Translate gene
file to protein
Gene BLAST
database
No blast match
or BSR<=0.6
BSR =1 &
same DNA seq?
LOT? BSR>0.6
Add new allele
to gene file
Calculate BSR
of the new allele
Calculate BSR
Re-do
Gene BLAST
database
LNF Exact Match LOT
Inferred
Allele
Allelic profile
Prodigal (Prokaryotic Dynamic Programming Gene findingAlgorithm):
BSR: Blast Score Ratio
LOT: Locus On theTip (of a contig)
19. Can be easily applied to:
- MLST
- MLVA
- SNP data*
- Gene Presence/absence
*Conversion ofVCF to PHYLOViZ:
https://github.com/nickloman/misc-genomics-tools/blob/master/scripts/vcf2phyloviz.py
(Thanks Nick!)
20. PROs:
Handles thousands of profiles
Fast calculation
Easy to annotate and explore metadata
Allows for basic statistics on profiles and metadata
Allows for advanced statistics on MSTs
(PLoS One. 2015 Mar 23;10(3):e0119315)
Exports high quality graphical formats
Allows plugin development
CONs:
goeBURST and goeBURST MST only
(Neighbour Joining and UPGMA soon)
JAVA knowledge to code new plugins
21. ďĄ MEGA (http://www.megasoftware.net/)
ďĄ Splitstree (http://www.splitstree.org/)
ďĄ Geneious (http://www.geneious.com/)
ď§ Multipurpose software: very useful for sequence alignment
visualization, tree building and annotation visualization
(commercial software)
22. ďĄ No need to take sides on choosing an approach. Gene-by-
gene, SNP, K-mer methods should be used depending on the
problem at hand and the questions
ďĄ The still evolving tool and sequencing methodology
development makes the creation of easy-to-use âbig red
buttonâ approaches difficult to implement
ďĄ Beware of differences in software /algorithm version that
can lead to different results
ďĄ Always be critical for the results you have and try to
understand if you have a nail or a screw before picking up the
hammer at hand
23. ďĄ UMMI Members:
ď§ Mickael Silva
ď§ Sergio Santos
ď§ Bruno Gonçalves
ď§ Adriana Policarpo
ď§ MĂĄrio Ramirez
ď§ JosĂŠ Melo-Cristino
ďĄ FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/):
ď§ Dag Harmsen (Univ. Muenster)
ď§ Stefan Niemann (Research Center Borstel)
ď§ Keith Jolley, James Bray and Martin Maiden (Univ. Oxford)
ď§ Joerg Rothganger (RIDOM)
ď§ Hannes Pouseele (Applied Maths)
ďĄ Genome Canada IRIDA project (www.irida.ca)
ď§ Franklin Bristow,Thomas Matthews, Aaron Petkau, Morag Graham and GaryVan Domselaar (NLM ,
PHAC)
ď§ EdTaboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC)
ď§ Fiona Brinkman (SFU)
ď§ William Hsiao (BCCDC)
INESC-ID Members:
Alexandre Francisco
CĂĄtiaVaz
PedroTiago Monteiro
INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS
Twitter Microbial Bioinf community:
Nick Loman
Torsteen Seeman
Will Schaik
MickWatson
Jennifer Gardy
Many, many othersâŚ.
24. Draft Scientific Programme:
Plenaries:
1) Small Scale Microbial Epidemiology
2) Large Scale Microbial Epidemiology
3) Bioinformatics for Genome-based Microbial Epidemiology
4) Population Genetics: Pathogen Emergence
5) Population Dynamics : Transmission networks and
surveillance
6) Molecular Epidemiology for Global Health and One
Health
Parallel Sessions
1) Food and Environmental pathogens
2) Microbial Forensics
3) Virus
4) Fungi and Yeasts
5) Novel Diagnostics methodologies
6) Novel Typing approaches
7) Phylogenetic Inference
8) Interactive Illustration Platforms