Difference Between Skeletal Smooth and Cardiac Muscles
Â
Next generation sequencing in pharmacogenomics
1. Gerry Higgins, Ph.D., M.D.
Vice President, Pharmacogenomic Science
AssureRx Health, Inc.
AssureRx Health, Inc. CONFIDENTIAL 1
2. Âť Explosive Growth in Sequence Data
Âť The âBig Dataâ Problem
Âť The âDiminishing Discoveryâ Problem
Âť Human Genome Variation and Pharmacogenomics
Âť Evolution of next generation sequencing (NGS)
technology
Âť Future Trends
AssureRx Health, Inc. CONFIDENTIAL 2
4. Explosive Growth in Sequence Data
As the cost of DNA sequencing falls,
the growth of human genome data becomes exponential
AssureRx Health, Inc. CONFIDENTIAL 4
5. The âBig Dataâ Problem
Lee Hood, IOM February 27, 2012
AssureRx Health, Inc. CONFIDENTIAL 5
6. The âBig Dataâ Problem
âThe world is shifting to an
innovation economy and nobody
does innovation better than
America.â
âPresident Obama, 12/6/2011
ď Pillers of Bioeconomy R&D:
1) Synthetic Biology
2) Proteomics
3) Information Technologyâ
Bioinformatics &
Computational Biology
AssureRx Health, Inc. CONFIDENTIAL 6
8. The âDiminishing Discoveryâ Problem
FDAâs Solution: Adaptation in the Pre-Competitive Space
SCREENING TRIAL Achieve surrogate
Investigational drugs end point predictive Promising drug candidate
of clinical outcome
& associated PGx markers & associated PGx marker
CONFIRMATORY TRIAL
Replicate Achieve clinical outcome
surrogate end (regulatory standard for
Promising drug candidate
point FDA approval)
& associated PGx marker
FDA APPROVAL
Accelerated drug approval with
Full drug approval
approval of PGx biomarker
*Slide adapted , with permission, from Janet Woodcock and Issam Zineh, CDER, FDA
AssureRx Health, Inc. CONFIDENTIAL 8
9. The âDiminishing Discoveryâ Problem
Pre-Competitive Collaboration: Solution for Pharma
⢠Share use cases/questions â gaps in current tools
⢠Identify common solutions & options
⢠Share development risk/costs
⢠Build interoperability standards into platforms
⢠Publicly share experiences - good & bad
⢠PPP (public-private-partnership) infrastructure
⢠Build portable talent base/experts across sites
⢠Compile innovations from participating groups
⢠Follow European model â share trial participants
⢠Faster path for FDA drug approval
AssureRx Health, Inc. CONFIDENTIAL 9
10. The âDiminishing Discoveryâ Problem
tranSMART: Bioinformatics & shared data analytics platform
⢠tranSMART is an open source informatics software platform that allows
pharmaceutical, diagnostic and medical device companies to share âpre-competitiveâ
data and a set of common tools for analysis of data. The license protects the
intellectual property of all stakeholders.
⢠Dr. Eric Perakslis, now CIO and Chief Scientist (Informatics) at the FDA, originally
developed tranSMART when he served as a research scientist at Johnson &
Johnson. tranSMART is based on the i2b2 informatics platform.
⢠tranSMART has been adopted more broadly in Europe than in the U.S. An example
of a study where âpre-competitiveâ data were shared (KM: Knowledge
Management):
U-BIOPRED
(Unbiased BIOmarkers in PREDiction
of respiratory disease outcomes)1
1Bel EH et al. Diagnosis and definition of severe refractory
asthma: an international consensus statement from the
Innovative Medicine Initiative (IMI). Thorax. 2011 66(10):910
AssureRx Health, Inc. CONFIDENTIAL 10
11. One Mind Integrative Informatics Platform
Genome Proteome Signaling Phenome Disease
Integrative Analyses Managed Thru Cloud-Based Portal
One Mind
PortalTM
Builds off of
tranSMART
Data Knowledge
Management
System
AssureRx Health, Inc. CONFIDENTIAL 11
13. Human Genome Variation as determined by NGS
âThe ability of sequencing to detect a site that is segregating in the population is dominated by two
factors:
1. Whether the non-reference allele is present among the individuals chosen for sequencing, and;
2. The number of high quality and well mapped reads that overlap the variant site in individuals who
carry it.
Simple models show that for a given total amount of sequencing, the number of variants discovered is
maximized by sequencing many samples at low coverage. This is because high coverage of a few
genomes, while providing the highest sensitivity and accuracy in genotyping a single individual, involves
considerable redundancy and misses variation not represented by those samples.â1
Genome variants of different Transposons
types, determined by low coverage
sequencing of individuals, trios Duplications
(e.g., mother, father and daughter) and
exons. These data are derived from the 1000
Deletions Known
genomes project.1 Novel
Insertions
⢠Note that they did not attempt to resolve
Copy Number Variants (CNVs) or Variable SNPs
Number of Tandem Repeats
(VNTRs), which convey inter-individual
variation. 0% 50% 100%
⢠Note the large percentage
1Durbin et al. A map of human genomeof novel from population-scale sequencing. 2010. Nature 467: 1061-1073.
SNPs
variation
that were discovered by NGS.
AssureRx Health, Inc. CONFIDENTIAL 13
14. Genome Variation and Pharmacogenomics
Some important points about Single Nucleotide Polymorphisms (SNPs) :
⢠All methods to determine human genome variation contain error.
⢠So-called âcommonâ SNPs, with a frequency of >0. 5%, have yielded modest effects in genome-
wide association scans (GWAS) for determination in complex diseases.
⢠Early results from pharmacogenomic GWAS appear to indicate a greater ability to discover SNPs
with substantial effect size. Nevertheless, they do not explain the full extent of human genome
variation and drug response. Pharmacogenomic GWAS are limited in power by small cohort sizes.1
⢠Although each human genome may have ~3 M SNPs, only some of these variants are deleterious.
⢠SNPs have been the easiest genomic variant to measure, but other variants, such as Copy Number
Variants (CNVs), may be more important determinants of drug response.2
⢠Most variants that impact individual drug response have not yet been identified.3*
1Guessous, I., Gwinn, M. & Khoury, M.J. Genome-wide association studies in pharmacogenomics: untapped potential for
translation. Genome Med 1, 46 (2009); Group, S.C. et al. SLCO1B1 variants and statin-induced myopathyâa genome
wide study. N Engl J Med 359, 789-799 (2008). Sato, Y. et al. A new statistical screening approach for finding
pharmacokinetics related genes in genome-wide studies. Pharmacogenomics J 9, 137-146 (2009);
Crowley, J.J., Sullivan, P.F. & McLeod, H.L. Pharmacogenomic genome-wide association studies: lessons learned thus
far. Pharmacogenomics 10, 161-163 (2009).
2Rasmussen H B et al. Genome-wide identification of structural variants in genes encoding drug targets: possible
implications for individualized drug therapy. Pharmacogenetics and Genomics. July 2012. 22 (7): 471-483.
3Durbin et al. A map of human genome variation from population-scale sequencing. 2010. Nature 467: 1061-1073. *FDA.
AssureRx Health, Inc. CONFIDENTIAL 14
15. Genome Variation and Pharmacogenomics
Allele-Specific PCR cannot accurately detect SNPs1:
Unknown SNP
1Favis,
R. Applying next generation sequencing to
Unknown SNP pharmacogenomics studies in clinical trials.
AssureRx Health, Inc. CONFIDENTIAL 15
16. Genome Variation and Pharmacogenomics
High throughput genotyping platforms cannot accurately resolve
allelic variants of the CYP2D6 superfamily1:
Genome-wide arrays, some that are specifically configured to examine
pharmacogene variants, were poor at discriminating CYP2D6 alleles:
1Gamazon ER et al. The limits of genome-wide methods for pharmacogenomics testing. Pharmacogenetics and
Genomics. 2012. 22:261â272.;
AssureRx Health, Inc. CONFIDENTIAL 16
17. Genome Variation and Pharmacogenomics
Some important points about Next Generation Sequencing (NGS):
⢠All methods to determine human genome variation contain error.
⢠All âshort readâ NGS methods rely on the use of a âreference genomeâ as ground truth, when the
various reference genomes have been shown to have unusual variation1.
⢠Short read NGS technology is fraught with errors, and thus either requires 60-100 fold coverage
for a single individual, or low coverage whole genome sequence data from a large popoulation2.
The most accurate results have been obtained from sequencing the whole genomes of closely-
related individuals, along with inclusion of other data related to family medical history1,3.
⢠Short read NGS technology is especially poor at calling variants in GC-rich regions of the genome
such as CpG islands.
⢠The real value is provided by long read technology, which has been implemented by Complete
Genomics, but they have a backlog of genomes to sequence under contract (~27,354 as of 6/12).
⢠So-called âclinicalâ or bench-top sequencers, such as Illuminaâs MiSeq or Life Technologies Ion
Torrent, manifest all the problems associated with short read technology, including extensive
pre-processing of tissue samples and complex data analysis.
1Dewey et al. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoS
Genet. 2011 September; 7(9): e1002280.
2Durbin et al. A map of human genome variation from population-scale sequencing. 2010. Nature 467: 1061-1073.
3Patel C J et al. Data-driven integration of epidemiological and toxicological data to select candidate interacting genes
and environmental factors in association with disease. Bioinformatics. 2012 Jun 15;28(12):i121-i126.
AssureRx Health, Inc. CONFIDENTIAL 17
18. Genome Variation and Pharmacogenomics
Whole genome sequencing & analysis has been able to resolve pharmacogene variation on a
genome-wide level, including the various alleles of the CYP2D6 superfamily1:
Allele Effect on Metabolism Allele Effect on Metabolism Allele Effect on Metabolism
*1 Fully functional *14 Null *33 Fully functional
*2 Fully functional *14A Null *35 Fully functional
*3 Null *14B Null *36 pseudogene
*4 Null *15 Null *37 Reduced activity
*5 Null *16 Null *38 Null
*6 Null *17 Reduced activity *39 pseudogene
*7 Null *18 Null *40 Null
*8 Null *19 Null *41 Reduced activity
*9 Reduced activity *20 Null *42 Null
*10 Reduced activity *25 pseudogene *43 pseudogene
*10AB Reduced activity *26 pseudogene *44 Null
*11 Reduced activity *29 Reduced activity *45 Reduced activity
*12 Null *30 pseudogene *46 Reduced activity
*13 Null *31 pseudogene *56 Reduced activity
1Black
JL et al. Frequency of undetected CYP2D6 hybrid genes in clinical samples: Impact on phenotype prediction. Drug
Metab Dispos June 2012 40:1238; Patents: United States Patent Application 20120088247;
AssureRx Health, Inc. CONFIDENTIAL 18
20. Trends in Next Generation Sequencing
2010 2013
Generation 2nd Generation NGS 3rd Generation NGS
Fundamental technology SBS or degradation Direct physical inspection of the DNA molecule
using nanopore, high speed camera and/or silicon
chip technology
Resolution Averaged across many copies of the DNA Single-molecule resolution
molecule being sequenced
Raw read accuracy High, with >60-fold coverage High, missed variant calls: 1 in 500kb â 1M bases
Read length Short - ~35 bases, generally much shorter Long, 10,000 bp and longer
than Sanger sequencing
Throughput High Highest
Current cost Low cost per base Lowest cost per base
RNA-sequencing cDNA sequencing Direct RNA sequencing and cDNA sequencing
Start-to-Finish Days One hour per whole genome
Sample preparation Complex, library and PCR amplification Very simple
required
Data analysis Complex because of large data volumes and Complex because of large data volumesâ however
because short reads complicate assembly and those can be solved by new high speed camera
alignment algorithms and chip technologies
Primary results Base calls with quality values Base calls with quality values, other base
information such as kinetics, structural variants
and phased haplotypes
AssureRx Health, Inc. CONFIDENTIAL 20
21. Trends in Next Generation Sequencing
2nd Generation NGS - Short read archive:
⢠Hardware and Service Companies â Market Shareâ Ilumina and Complete
Genomics sequenced over 90% of all genomes as of 10/1/111
Percentage of Whole Human Genomes Sequenced
Illumina
Complete Genomics
Life Technologies
Others
⢠Concordance of variant calls â Illumina versus Complete Genomics short read1
Concordance between platforms: SNPs Indels
(One individual, 76-fold coverage, ~3.7M SNPs)
88.1% 26.5%
1Lam HL et al. Performance comparison of whole-genome sequencing platforms. Nature Biotech. 2012. 30: 78-82.
AssureRx Health, Inc. CONFIDENTIAL 21
22. Next Generation Sequencing â Update 6/12
Company Product(s) Tech Problems Prognosis
⢠HiSeq 2nd generation - Too expensive; Will eventually be
⢠MiSeq clinical Short read Should have taken buyout acquired at bargain
sequencer* from Roche; Dominate market price, or merge â best
*(FDA-approved Type â believe the can also candidate for M&A is
III device) dominate MDx BGI.
Sequencing-as-a- 2nd generation Just laid off 55 employees â Long read technology is
service - Short read restructuring so as to only very accurate, but have
(75% of focus on clinical markets â no âover-committedâ,
business); more life sciences research. including Mayo, ARUP,
3rd generation Need to switch to long read INOVA, Partners, etc.
(25%) technology ASAP â but canât Will survive âŚ
because of sequence backlog.
⢠Personal 2nd generation Tiny market share; already Company is diversified
genome - Short read pushed back dates on Ion enough to subsidize
⢠Exome Torrent Exome to 9/12 sequencing hardware
machine
⢠GridIron and 3rd generation No credibility; USB mini-pore Long read technology is
Mini-Ion â long read â can only sequence one accurate, Company has
licensed from genome in closed system â over $150M fundingâ
Winters-Hilt expensive. who knows?
Not named yet 3rd generation âStill working on the Long read technology is
â long read â chemistryâ. CEO wonât discuss very accurate,
licensed from status of company⌠represents optimal
AssureRx Health, Inc.
Winters-Hilt solution â will survive.
CONFIDENTIAL 22
23. NGS â Complete Genomics, Inc.
AssureRx Health, Inc. CONFIDENTIAL 23
24. NGS â Long Read Nanopore Solutions
Complete Genomics Their most recent technology involves
combining a very high speed CCD (charge-
coupled display) camera with each DNA
base tagged with a fluorochrome coming
through a nanopore.
â˘They have achieved 500Kb read
lengths, claim error rate is âI missed base
call variant every 500Kbâ â Lee Hood.
â˘They have been able to resolve phased
maternal and paternal chromosomes
1. Extract and fragment DNA
â˘They can resolve distributed repeats (e.g.
2. Each base (A, C, G, T) tagged
pseudogenes)
with a different fluorochrome
3. Multi-planar graphene array â˘However, their in-house, pre- and post-
4. High-speed CCD camera â can processing steps are very complex and time-
consuming, their turnaround time for a
capture every base per pixel
human genome with a coverage of 10-fold is
with DNA traveling at ~10 base 72 days, and they now have a backlog of
pairs per second. 25,000 genomes.
AssureRx Health, Inc. CONFIDENTIAL 24
25. NGS â Long Read Nanopore Solutions
Ideal System1 Rosenstein et al1 latest device can accurately
sequence 1 million base pairs of double-
stranded DNA without error.
⢠Unlike most researchers interested in
using nanopores to directly sequence
DNA that have slowed the DNA velocity in
the nanopore translocation stage through
adding an enzyme ratchet such as Oxford
Nanopore Technology to accommodate
the low bandwidths available, these
1. Extract DNA. researchers used complementary metal-
2. Pass ânakedâ DNA through oxide semiconductor (CMOS) processing
graphene nanopore array. and integrated circuits technology.
3. High bandwidth CMOS pre-amplifier ⢠They have been able to redesign their
system to increase the bandwidth above
positioned under every pore. 50MHz, with a very low signal-to-noise
4. Solid state silicon nitride membrane ratio to sequence an entire human
chip mounted in the fluid cell. genome with very little sample
preparation in 20 minutes.
1RosensteinJK et al. Integrated nanopore sensing platform with sub-microsecond temporal resolution. Nature
Methods. 2012. 9 (5): 487-492.
AssureRx Health, Inc. CONFIDENTIAL 25
26. WGA â Clinical Interpretation Software
Whole Genome Analysis - âThe $1,000 genome and the $1M interpretation.â
3 major approaches:
⢠Filter data followed by complex analysis â Used by Cypher Genomics and Illumina
⢠Apply proprietary natural language processing algorithms against whole
genome or whole exome data â Used by Silicon Valley Biosystems
⢠Genomic best linear unbiased prediction (GBLUP) method to evaluate
predictive ability by cross-validation. GBLUP approaches take into account the
covariance structure inferred from the genomic data. Best predictive
accuracy1,2
1Ober Uet al. Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster.
PLoS Genetics. May 2012. 8 (5): 1-14.
2Jones B. Predicting phenotypes. Nature Reviews Genetics. 2012. 13. doi:10.1038/nrg3267
AssureRx Health, Inc. CONFIDENTIAL 26
27. WGA â Clinical Interpretation Software
Whole Genome Analysis - Example from Cypher Genomics
AssureRx Health, Inc. CONFIDENTIAL 27
28. WGA â Clinical Interpretation Software
Whole Genome Analysis - Example from Cypher Genomics
AssureRx Health, Inc. CONFIDENTIAL 28
31. Lab & Technology Operations
Lab
⢠Results delivered within one business day of
receipt of a patientâs DNA sample
⢠CLIA certified
⢠CAP accredited
⢠NY State Department of Health certified
Technology
⢠Advanced bioinformatics
⢠World-class data center operations
⢠Secure Internet protocols
⢠HIPAA compliant architecture
⢠Data integration with Facility Health Information
Management Systems
AssureRx Health, Inc. CONFIDENTIAL 31
Hinweis der Redaktion
methods based on single experiments and gene properties alone not enough for multifactorial diseases. Information for a disease involvement encoded by multiple platforms