2024: Domino Containers - The Next Step. News from the Domino Container commu...
Next Gen Sequencing (NGS) Technology Overview
1. Next Gen Sequencing [NGS]
• History of DNA Sequencing
– Maxam-Gilbert
– Sanger
– ABI
• NGS Technologies:
– 454, Illumina, PacBio, ABI, Helicos,
– Ion Torrent, Nanopores
• Applications:
– Genomes, RNASeq, ChIPSeq, CGH, CancerGenome
, Environmental
Human Genome: 1990-2000
Presented by Dominic Suciu, Ph.D.
2. Preliminaries: Central Dogma
Gene ~ Protein ~ Enzyme
Gene (DNA)
[Program in directory]
Protein (PolyPeptide)
[Program in RAM]
~~ Enzyme ~~
Functional agent
Messenger RNA
Genome (DNA)
[Hard drive]
3. Preliminaries: Phages
BacterioPhages are viruses that infect bacteria
Some Bacteria are immune to certain phages
[Hamilton O. Smith, early 70‟s]
Restriction Endonucleases: Enzymes that specifically
cleave certain DNA sequences.
Bacterial cells use these as a crude anti-phage
defense mechanisms
4. Preliminaries: Restriction Enzymes
• Molecular scissors
• Their discovery allowed researchers to physically map genomes
• Big confirmatory clue that Genome sequence determines species and even individuals
5. Preliminaries: Cloning
Start with picograms of DNA
End up with microgarms of highly purified copies
Each Colony is highly enriched
Each colony is endlessly amplifyable
pBR322: is a vector, an
engineered phage.
It can reproduce itself
inside a bacterial host
and do nothing else.
7. Deconstructing Sequencing
• DNA source: gel-purified fragment, cloning
product, random fragmentation.
• DNA Amplification: need enough to be able to detect
signal given off by base interrogation
• DNA Seq Method: Base interrogation method to
uniquely detect G,A,T,C bases.
• Sequence Positioning: Need an organizing principal to
place these bases into a sequence.
The methods presented here represent unique ways to solve each of these issues
9. Maxam-Gilbert 1975
Chemical Sequencing
Issues:
• Need perfectly pure single species of DNA
• Nasty Chemicals
• Radioactive End-labeling
• 4-lanes/read
• Sequence only what you can purify
Advantages:
- 1st DNA sequencing available
- 2-300 bp/read
Fragment
population
distribution
corresponds to
appearance of
base within
sequence
10. Sanger “Sequencing-by-Synthesis” 1977
Issues:
- Radioactive End-labeling
- 4-lanes/read
- Sequencing gels
Advantages:
- 4-500 bp/reads
- Radioactive Incorporation
- Primer gives you control
dNTP ddNTP
11. PCR Dye-Terminator 1990‟s
Issues:
- Sequencing gels
- 1 run/day
Advantages:
- 600-700 bp/reads
- 96 reads/run
- Each terminator dye has a different
color. Lets you combine all 4 reactions
in one lane.
- Single lane/read
- Primer gives you control
12. Human Genome Project (15 years) Hierarchical
Shotgun Sequencing [start1990]
- Randomly insert Human DNA into BAC clones (~150kbp each)
- Combine these BAC clones to create a scaffold of the human
genome. Each BAC clone will be mapped to a region on a
Human Chromosome
- Pass BAC clones to different Genome Centers throughout US
- At each center, each vector is sequenced using shotgun sequencing
- Wait 15 years for results.
13. Issues with Shotgun Sequencing
• Reads-> contigs -> scaffolds -> genome reconstruction
• Repeat regions can confuse Contig assemblers.
• It was hoped that by focusing each shotgun run to a single 40-150kb region, these
issues would be minimized.
• According to Venter, it simply multiplied the number of times one encountered the
same problem
14. Shotgun Sequencing: Venter 1997
Same approach is used throughout NGS
Paired-end sequencing:
1. Randomly cut genomic DNA.
2. Use Gel-purification to make three
libraries of random DNA fragments:
2kb, 10kb, 50kb
2. Sequence from both ends.
3. Use distance information to assemble
contigs into scaffolds.
Distance information allows you to
„jump‟ over repeat regions.
This approach allowed Venter to „jump‟
over the federal sequencing project
15. NGS Revolution: Roche / 454 -> [2005]
ABI 3700 state of the art
in 1997
- 1 sample per rxn (96
rxns) in 2 hrs
- Each sample had to be
individually manipulated
454 solved both these problems
PPi + H+
Paired-end reads can be done by including both primers on each micro-bead
Emulsion PCR:
16. Roche / 454 -> [2005]
• emPCR: No need for
cells
• Each well is a single
sequencing run.
• Very fast reaction
18. Illumina
Advantages:
• No need for cells
• Each cluster of DNA
molecules is a single reaction.
• Enormous amounts of reads
• Paired ends Sequence from
both sides.
Disadvantages:
• Slow
• Short reads
• Reagent costs
25. Applications: Genome Sequencing
Sequencing of whole genomes: bacterial, animal, human.
De novo Genome Sequencing: Even with the large number of
reads, putting a genome together from raw sequence reads is still
a non-trivial task, due to sample prep and inherent complexity.
Re-sequencing:
Sequencing individual with a genetic disease in
order to find hereditary mutations.
Read depth allows one to compute allele-
frequencies.
454: Due to its long reads, this method is best for de novo.
Useful for scaffolding.
SOLiD, Illumina: used for re-sequencing
SOLiD: wins out due to accuracy loses based on
complexity/cost
Complete Genomics: CRO model, depth 40x
26. Applications: Exon Sequencing
Mutational screening: what are the mutations in the actual
coding regions?
Most heritable disease models have mutations in the
coding regions.
Use enrichment to focus sequencing to expressed space.
Then make as many reads as possible in order to
accurately compute mutations.
Illumina, 454, ABI
27. Enrichment: Microarrays are Not dead!
Why?:
In order to focus sequencing run on the
region you are interested in.
Ex:
• Expressed region of genome (1%)
• Genes of interest: mutational studies.
Three ways:
• Micro-droplet PCR: each droplet has
unique set of amplification primers.
• MIP-PCR
• On-chip enrichment, using
microarrays.
• On-bead enrichment: make oligo
pools, use them to capture targets for
sequencing.
28. Two approaches for finding causative mutation responsible
for Miller Syndrome
Sequence Whole Genome: Complete Genomics
• Sequenced Mother, Father and 2 kids (both affected) 1 kindred
• Regions where they share both copies from parents (22%)
• Both diseases are rare: look for locations with low prevalence
SNP‟s (dbSNP)
• Narrowed down to 4 genes
• 2 of these were found to be causative agent in exome sequencing
study
Exome Array: Just sequence expressed sequence space
(1%): Illumina GAII
• Sequenced genomes from 4 affected individuals in 3
kindreds
• Found 4600 mutants
• Ignored any previously discovered SNPs from dbSNP
• Looked for mutations that appeared in all 3 kindreds
• Focused on damaging mutations Non-synonymous, stop
codon
• Discovered causative locus by elimination
29. Applications: RNA-Seq
Microarrays are Dead!
Don‟t have to design probes ahead of time, just sequence
mRNA and count number of sequences for each gene.
Read count ~ Expression level
In environmental genomics, sequencing can be used to
determine which genes are being expressed in a sample.
Illumina: Only method that has the read depth to get
useful spread between high and low-expressed
genes.
Its Dynamic Range far surpasses microarrays in this
respect, especially for smaller genomes.