SlideShare ist ein Scribd-Unternehmen logo
1 von 21
How to Standardise and Assemble Raw Data into Sequences:

What Does it Mean for a Laboratory to Use Such Technologies?"
Dr Joseph Hughes!
!
!11th OIE Seminar!
Saskatoon - 17th June 2015!
Decreasing sequencing
cost!
$0.01
$0.10
$1.00
$10.00
$100.00
$1,000.00
$10,000.00
Jul-98 Apr-01 Jan-04 Oct-06 Jul-09 Apr-12Dec-14Sep-17
Cost per raw Megabase of DNA sequence!
http://www.genome.gov/
sequencingcosts!
Democratization of
sequencing!
http://omicsmaps.com!
Applications of High throughput
sequencing"
•  Whole genome sequencing!
•  Genome variability within a host!
•  De-novo assembly of novel viruses!
•  Metagenomics of communities!
Considerations for a genome
assembly pipeline
•  Flexible pipeline: Handling unknown genotypes or virus
samples!
•  Platform independent: work with data from different
platforms!
•  Virus independent: work on any virus!
•  Scalable to hundreds or thousands of samples!
•  Accuracy of SNP calling in the genome (outbreak analysis
where samples are more closely related)!
Known reference" Unknown reference"
Pre-assembly "
Processing"
Check format (sff, fastq) !
Convert to FASTQ!
Remove adaptor contaminants!
Remove host genome contamination!
Quality & length trimming!
Reference assembly!
De-novo assembly!
Contig merge!
Scaffolding contigs!
Validation!
Consensus!
Variant calling!
Classification!
Assembly"
Post-assembly processing"
Annotation!
Genome comparison!
Examples
1.  1999-2001 in Northern Italy:
emergence of highly pathogenic
avian influenza H7N1!
•  Identify known molecular markers for viral
pathogenicity in intra-host viral populations!
•  OIE & FAO reference lab for Influenza!
2.  2010 in the Netherlands: die-off of
>1000 wild water frogs and newts!
•  Isolation, characterisation and relationship to
known viruses of the Dutch frog killer!
•  Van Beurden et al. (2014). Genome Announc.!
hybrid Edible frog !
(Pelophylax kl. esculentus)!
Example 1:

Characterization of HPAI signature
mutations"
Monne et al. (2014). Journal of Virology!
Pre-assembly processing"
trim_galore and
FastQC for quality
control!
Reference assemblers?"
•  Hash based tools: Mosaik, Novoalign, Stampy, Tanoti!
•  Borrrows-Wheeler Transform-based tools: BWA, Bowtie2,
NextGenMap!
Too many to choose from!
http://www.bioinformatics.cvr.ac.uk/Tanoti!
HA
position
log10(DOC)
0.0
0.5
1.0
1.5
2.0
2.5
500 1000 1500
M
position
log10(DOC)
0
1
2
3
4
200 400 600 800 1000
NA
position
log10(DOC)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
200 400 600 800 1000 1200
NP
position
log10(DOC)
0
1
2
3
500 1000 1500
NS
position
log10(DOC)
0
1
2
3
4
200 400 600 800
PA
position
log10(DOC)
0.0
0.5
1.0
1.5
2.0
2.5
500 1000 1500 2000
PB1
position
log10(DOC)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
500 1000 1500 2000
PB2
position
log10(DOC)
0
1
2
3
500 1000 1500 2000
Bowtie2 and Stampy !
Tanoti!
!
Tablet - assembly
Variant calling – detecting true
mutations"
•  Many tools LoFreq, Vphaser, DiversiTools!
•  Using replicates to validate mutations (e.g. FMDV
experiments)!
!
One LPAI sample collected after the identification of HPAI
with an HA cleavage site and multiple HPAI associated
mutations at extremely low frequency!
PB2_I398T
PB1_D154G
PB1_G216S
PB1_E745K
PA_T61I
PA_K115N
PA_K252E
HA_A130T
HA_T146A
HA_E228A
HA_T454A
HA_R554K
NP_A349T
NP_N376S
NA_K173R
M1_A166V
NS1_I136V
NS1_N139D
NS1_-225R
X4756.99
X4827.99
X4828.99
X4911.99
X4708.99
X4618.99
X4618.99.1
X4749.99
PB2_I398T
PB1_D154G
PB1_G216S
PB1_E745K
PA_T61I
PA_K115N
PA_K252E
HA_A130T
HA_T146A
HA_E228A
HA_T454A
HA_R554K
NP_A349T
NP_N376S
NA_K173R
M1_A166V
NS1_I136V
NS1_N139D
NS1_-225R
X4295.99
X3675.99
X4829.99
X1744.99
X2732.99
X3283.99
Frequency of LPAI in HPAI samples ! Frequency of HPAI in LPAI samples !
Amino acid changes!
Samples!
Amino acid changes!
Example 2:

Isolation and Sequencing"
•  From dead wild water frog in September 2013!
•  Suspension from pooled internal organs!
•  Inoculated on BF-2 cells (Bluegill Fry cells fibroblast)!
•  DNA extracted using Dneasy kit (viral purity of 67%!
•  DNA sheared by sonication!
•  KAPA library preparation!
•  MiSeq (Illumina) Machine #2 test run: total run 26,700,000
reads including 50% PhiX (16Gb)!
•  13,127,123 paired-end 300 bp reads from the sample (7.9
Gb)!
Assembly"
•  Abyss-pe de-novo assembler reconstructed the full-
genome in a single contig of 107,260!
•  5 different regions had ambiguous/repetitive sequences !
•  Re-sequencing ambiguous regions with Sanger!
1!
1692!
1693!
21168!
21359!
38364!
38387!
66887!
67100!
73322!
73434!
107260!
?! ?! ?! ?! ?!
Finishing assembly"
•  CodonCode Aligner for assembling and checking the
Sanger sequences!
•  SequencePatcher.pl to stitch the Sanger sequences into
the de-novo contig!
•  iCORN2!
•  Final genome of 107,260 => 107,772bp!
Annotating
•  BLAST to find the most similar annotated genome!
•  Common Midwife Toad Virus (CMTV) from Spain!
•  Transfer of annotations from CMTV to the full genome
(RATT)!
•  Identifies inappropriate start codons, frame-shifts!
•  Correcting of transferred models using Artemis!
20 kb
RGV JQ654586
STIV EU627010
FV3 KJ175144
FV3 AY548484
TFV AF389451
CGSIV KF512820
ADRV KF033124
ADRV KC865735
CMTV NL
CMTV JQ231222
ATV AY150217
EHNV FJ433873
ESV JQ724856
84!
95!
100!
100!
76!
100!
100!
100!
Standard formats"
•  FASTQ – quality score depends on the technology and
base caller!
!
•  SAM – soon v1.5 extensions!
Genome standards – 5 categories!
Ladner et al.(2014) mBio !
% genome!
covered!
!
>50%!
!
!
~80-90%!
!
!
~90-99%!
!
!
100%!
!
!
100%!
!
HTS!
coverage!
!
!
!
!
~15-30 x!
!
!
>100 x!
!
!
RACE!
!
~ 400 !
– 1000 x!
!
1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012
0
1
10
100
1,000
10,000
100,000
1,000,000
0.1
1
10
100
1000
10,000
100,000
1,000,000
10,000,000
100,000,000
Year
Diskstorage(Mbytes/$)
DNAsequencing(bp/$)
Hard disk storage (MB/$)
Doubling time 14 months
Pre-NGS (bp/$)
Doubling time 19 months
-
NGS (bp/$)
Doubling time 5 months
http://genomebiology.com/2010/11/5/207!
Challenges: Rates of increase in data"
Challenges: resources and
technologies"
•  Shift towards more data, labs need to have dedicated
bioinformaticians!
•  Rule of thumb: invest as much in computers and data
scientists as in sequencing equipment and lab
technicians!
•  Non-uniform coverage, repeat regions, systematic biases,
PCR errors, sequencing errors, sequence length!
CVR bioinformatics team!
Director of OIE Collaborating Centre
for Viral Genomics and Bioinformatics!
Director of Centre for Virus Research!

Weitere ähnliche Inhalte

Was ist angesagt?

Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...eventi-ITBbari
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approachHong ChangBum
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowBrian Krueger
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansGenomeInABottle
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Productsbiochain
 
Lab in a Suitcase and Other Adventures with Nanopore Sequencing
Lab in a Suitcase and Other Adventures with Nanopore SequencingLab in a Suitcase and Other Adventures with Nanopore Sequencing
Lab in a Suitcase and Other Adventures with Nanopore Sequencingscalene
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeJustin Johnson
 
Next-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesNext-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesJan Aerts
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014LutzFr
 
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA Roberto Scarafia
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiomejukais
 
A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...Alexander Jueterbock
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Torsten Seemann
 

Was ist angesagt? (20)

Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Products
 
Lab in a Suitcase and Other Adventures with Nanopore Sequencing
Lab in a Suitcase and Other Adventures with Nanopore SequencingLab in a Suitcase and Other Adventures with Nanopore Sequencing
Lab in a Suitcase and Other Adventures with Nanopore Sequencing
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
 
Next-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesNext-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologies
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
 

Ähnlich wie How to Standardise and Assemble Raw Data into Sequences: What Does it Mean for a Laboratory to Use Such Technologies?

Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxHaibo Liu
 
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
 Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ... Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...Fabio Caligaris
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Thomas Keane
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineGabe Rudy
 
Ruby on bioinformatics
Ruby on bioinformaticsRuby on bioinformatics
Ruby on bioinformaticsTse-Ching Ho
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
Computational Biology thesis defense
Computational Biology thesis defenseComputational Biology thesis defense
Computational Biology thesis defensecsfunk
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGScursoNGS
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014Torsten Seemann
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsAndrea Ujvari
 

Ähnlich wie How to Standardise and Assemble Raw Data into Sequences: What Does it Mean for a Laboratory to Use Such Technologies? (20)

Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptx
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
 Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ... Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision Medicine
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Ruby on bioinformatics
Ruby on bioinformaticsRuby on bioinformatics
Ruby on bioinformatics
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
Computational Biology thesis defense
Computational Biology thesis defenseComputational Biology thesis defense
Computational Biology thesis defense
 
05 costa
05 costa05 costa
05 costa
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGS
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_results
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Ouellette icgc toronto_oct2012_fged_ver02
Ouellette icgc toronto_oct2012_fged_ver02Ouellette icgc toronto_oct2012_fged_ver02
Ouellette icgc toronto_oct2012_fged_ver02
 

Kürzlich hochgeladen

Alternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusAlternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusPradnya Wadekar
 
Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchPrachya Adhyayan
 
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrashi Coaching
 
World Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabWorld Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabkiyorndlab
 
geometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsgeometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsHassan Jolany
 
Gene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfGene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfNetHelix
 
Role of herbs in hair care Amla and heena.pptx
Role of herbs in hair care  Amla and  heena.pptxRole of herbs in hair care  Amla and  heena.pptx
Role of herbs in hair care Amla and heena.pptxVaishnaviAware
 
Main Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearMain Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearmarwaahmad357
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxmarwaahmad357
 
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...PirithiRaju
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docxmarwaahmad357
 
Bureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxBureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxkastureyashashree
 
M.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsM.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsSumathi Arumugam
 
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdfSUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdfsantiagojoderickdoma
 
Human brain.. It's parts and function.
Human brain.. It's parts and function. Human brain.. It's parts and function.
Human brain.. It's parts and function. MUKTA MANJARI SAHOO
 
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Sérgio Sacani
 
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxTHE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxAkinrotimiOluwadunsi
 
Lehninger_Chapter 17_Fatty acid Oxid.ppt
Lehninger_Chapter 17_Fatty acid Oxid.pptLehninger_Chapter 17_Fatty acid Oxid.ppt
Lehninger_Chapter 17_Fatty acid Oxid.pptSachin Teotia
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPirithiRaju
 

Kürzlich hochgeladen (20)

Alternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusAlternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabus
 
Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & Research
 
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
 
World Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabWorld Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlab
 
geometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsgeometric quantization on coadjoint orbits
geometric quantization on coadjoint orbits
 
Gene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfGene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdf
 
Role of herbs in hair care Amla and heena.pptx
Role of herbs in hair care  Amla and  heena.pptxRole of herbs in hair care  Amla and  heena.pptx
Role of herbs in hair care Amla and heena.pptx
 
Main Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearMain Exam Applied biochemistry final year
Main Exam Applied biochemistry final year
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docx
 
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docx
 
Bureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxBureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptx
 
M.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsM.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery Systems
 
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdfSUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
 
Human brain.. It's parts and function.
Human brain.. It's parts and function. Human brain.. It's parts and function.
Human brain.. It's parts and function.
 
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
 
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxTHE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
 
Lehninger_Chapter 17_Fatty acid Oxid.ppt
Lehninger_Chapter 17_Fatty acid Oxid.pptLehninger_Chapter 17_Fatty acid Oxid.ppt
Lehninger_Chapter 17_Fatty acid Oxid.ppt
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPR
 

How to Standardise and Assemble Raw Data into Sequences: What Does it Mean for a Laboratory to Use Such Technologies?

  • 1. How to Standardise and Assemble Raw Data into Sequences:
 What Does it Mean for a Laboratory to Use Such Technologies?" Dr Joseph Hughes! ! !11th OIE Seminar! Saskatoon - 17th June 2015!
  • 2. Decreasing sequencing cost! $0.01 $0.10 $1.00 $10.00 $100.00 $1,000.00 $10,000.00 Jul-98 Apr-01 Jan-04 Oct-06 Jul-09 Apr-12Dec-14Sep-17 Cost per raw Megabase of DNA sequence! http://www.genome.gov/ sequencingcosts! Democratization of sequencing! http://omicsmaps.com!
  • 3. Applications of High throughput sequencing" •  Whole genome sequencing! •  Genome variability within a host! •  De-novo assembly of novel viruses! •  Metagenomics of communities!
  • 4. Considerations for a genome assembly pipeline •  Flexible pipeline: Handling unknown genotypes or virus samples! •  Platform independent: work with data from different platforms! •  Virus independent: work on any virus! •  Scalable to hundreds or thousands of samples! •  Accuracy of SNP calling in the genome (outbreak analysis where samples are more closely related)!
  • 5. Known reference" Unknown reference" Pre-assembly " Processing" Check format (sff, fastq) ! Convert to FASTQ! Remove adaptor contaminants! Remove host genome contamination! Quality & length trimming! Reference assembly! De-novo assembly! Contig merge! Scaffolding contigs! Validation! Consensus! Variant calling! Classification! Assembly" Post-assembly processing" Annotation! Genome comparison!
  • 6. Examples 1.  1999-2001 in Northern Italy: emergence of highly pathogenic avian influenza H7N1! •  Identify known molecular markers for viral pathogenicity in intra-host viral populations! •  OIE & FAO reference lab for Influenza! 2.  2010 in the Netherlands: die-off of >1000 wild water frogs and newts! •  Isolation, characterisation and relationship to known viruses of the Dutch frog killer! •  Van Beurden et al. (2014). Genome Announc.! hybrid Edible frog ! (Pelophylax kl. esculentus)!
  • 7. Example 1:
 Characterization of HPAI signature mutations" Monne et al. (2014). Journal of Virology!
  • 9. Reference assemblers?" •  Hash based tools: Mosaik, Novoalign, Stampy, Tanoti! •  Borrrows-Wheeler Transform-based tools: BWA, Bowtie2, NextGenMap! Too many to choose from! http://www.bioinformatics.cvr.ac.uk/Tanoti! HA position log10(DOC) 0.0 0.5 1.0 1.5 2.0 2.5 500 1000 1500 M position log10(DOC) 0 1 2 3 4 200 400 600 800 1000 NA position log10(DOC) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 200 400 600 800 1000 1200 NP position log10(DOC) 0 1 2 3 500 1000 1500 NS position log10(DOC) 0 1 2 3 4 200 400 600 800 PA position log10(DOC) 0.0 0.5 1.0 1.5 2.0 2.5 500 1000 1500 2000 PB1 position log10(DOC) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 500 1000 1500 2000 PB2 position log10(DOC) 0 1 2 3 500 1000 1500 2000 Bowtie2 and Stampy ! Tanoti! !
  • 11. Variant calling – detecting true mutations" •  Many tools LoFreq, Vphaser, DiversiTools! •  Using replicates to validate mutations (e.g. FMDV experiments)! ! One LPAI sample collected after the identification of HPAI with an HA cleavage site and multiple HPAI associated mutations at extremely low frequency! PB2_I398T PB1_D154G PB1_G216S PB1_E745K PA_T61I PA_K115N PA_K252E HA_A130T HA_T146A HA_E228A HA_T454A HA_R554K NP_A349T NP_N376S NA_K173R M1_A166V NS1_I136V NS1_N139D NS1_-225R X4756.99 X4827.99 X4828.99 X4911.99 X4708.99 X4618.99 X4618.99.1 X4749.99 PB2_I398T PB1_D154G PB1_G216S PB1_E745K PA_T61I PA_K115N PA_K252E HA_A130T HA_T146A HA_E228A HA_T454A HA_R554K NP_A349T NP_N376S NA_K173R M1_A166V NS1_I136V NS1_N139D NS1_-225R X4295.99 X3675.99 X4829.99 X1744.99 X2732.99 X3283.99 Frequency of LPAI in HPAI samples ! Frequency of HPAI in LPAI samples ! Amino acid changes! Samples! Amino acid changes!
  • 12. Example 2:
 Isolation and Sequencing" •  From dead wild water frog in September 2013! •  Suspension from pooled internal organs! •  Inoculated on BF-2 cells (Bluegill Fry cells fibroblast)! •  DNA extracted using Dneasy kit (viral purity of 67%! •  DNA sheared by sonication! •  KAPA library preparation! •  MiSeq (Illumina) Machine #2 test run: total run 26,700,000 reads including 50% PhiX (16Gb)! •  13,127,123 paired-end 300 bp reads from the sample (7.9 Gb)!
  • 13. Assembly" •  Abyss-pe de-novo assembler reconstructed the full- genome in a single contig of 107,260! •  5 different regions had ambiguous/repetitive sequences ! •  Re-sequencing ambiguous regions with Sanger! 1! 1692! 1693! 21168! 21359! 38364! 38387! 66887! 67100! 73322! 73434! 107260! ?! ?! ?! ?! ?!
  • 14. Finishing assembly" •  CodonCode Aligner for assembling and checking the Sanger sequences! •  SequencePatcher.pl to stitch the Sanger sequences into the de-novo contig! •  iCORN2! •  Final genome of 107,260 => 107,772bp!
  • 15. Annotating •  BLAST to find the most similar annotated genome! •  Common Midwife Toad Virus (CMTV) from Spain! •  Transfer of annotations from CMTV to the full genome (RATT)! •  Identifies inappropriate start codons, frame-shifts! •  Correcting of transferred models using Artemis!
  • 16. 20 kb RGV JQ654586 STIV EU627010 FV3 KJ175144 FV3 AY548484 TFV AF389451 CGSIV KF512820 ADRV KF033124 ADRV KC865735 CMTV NL CMTV JQ231222 ATV AY150217 EHNV FJ433873 ESV JQ724856 84! 95! 100! 100! 76! 100! 100! 100!
  • 17. Standard formats" •  FASTQ – quality score depends on the technology and base caller! ! •  SAM – soon v1.5 extensions!
  • 18. Genome standards – 5 categories! Ladner et al.(2014) mBio ! % genome! covered! ! >50%! ! ! ~80-90%! ! ! ~90-99%! ! ! 100%! ! ! 100%! ! HTS! coverage! ! ! ! ! ~15-30 x! ! ! >100 x! ! ! RACE! ! ~ 400 ! – 1000 x! !
  • 19. 1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012 0 1 10 100 1,000 10,000 100,000 1,000,000 0.1 1 10 100 1000 10,000 100,000 1,000,000 10,000,000 100,000,000 Year Diskstorage(Mbytes/$) DNAsequencing(bp/$) Hard disk storage (MB/$) Doubling time 14 months Pre-NGS (bp/$) Doubling time 19 months - NGS (bp/$) Doubling time 5 months http://genomebiology.com/2010/11/5/207! Challenges: Rates of increase in data"
  • 20. Challenges: resources and technologies" •  Shift towards more data, labs need to have dedicated bioinformaticians! •  Rule of thumb: invest as much in computers and data scientists as in sequencing equipment and lab technicians! •  Non-uniform coverage, repeat regions, systematic biases, PCR errors, sequencing errors, sequence length!
  • 21. CVR bioinformatics team! Director of OIE Collaborating Centre for Viral Genomics and Bioinformatics! Director of Centre for Virus Research!