Next generation sequencing in pharmacogenomics

Gerry Higgins, Ph.D., M.D.
Vice President, Pharmacogenomic Science
AssureRx Health, Inc.

AssureRx Health, Inc. CONFIDENTIAL 1

» Explosive Growth in Sequence Data
» The ‘Big Data’ Problem
» The ‘Diminishing Discovery’ Problem
» Human Genome Variation and Pharmacogenomics
» Evolution of next generation sequencing (NGS)
technology
» Future Trends


Explosive Growth in Sequence Data

As the cost of DNA sequencing falls,
the growth of human genome data becomes exponential


The ‘Big Data’ Problem

Lee Hood, IOM February 27, 2012

The ‘Big Data’ Problem
“The world is shifting to an
innovation economy and nobody
does innovation better than
America.”
—President Obama, 12/6/2011

 Pillers of Bioeconomy R&D:
1) Synthetic Biology
2) Proteomics
3) Information Technology—
Bioinformatics &
Computational Biology


The ‘Diminishing Discovery’ Problem


FDA’s Solution: Adaptation in the Pre-Competitive Space
SCREENING TRIAL Achieve surrogate
Investigational drugs end point predictive Promising drug candidate
of clinical outcome
& associated PGx markers & associated PGx marker

CONFIRMATORY TRIAL
Replicate Achieve clinical outcome
surrogate end (regulatory standard for
Promising drug candidate
point FDA approval)
& associated PGx marker

FDA APPROVAL
Accelerated drug approval with
Full drug approval
approval of PGx biomarker

*Slide adapted , with permission, from Janet Woodcock and Issam Zineh, CDER, FDA


Pre-Competitive Collaboration: Solution for Pharma

• Share use cases/questions – gaps in current tools
• Identify common solutions & options
• Share development risk/costs
• Build interoperability standards into platforms
• Publicly share experiences - good & bad
• PPP (public-private-partnership) infrastructure
• Build portable talent base/experts across sites
• Compile innovations from participating groups
• Follow European model – share trial participants
• Faster path for FDA drug approval


tranSMART: Bioinformatics & shared data analytics platform
• tranSMART is an open source informatics software platform that allows
pharmaceutical, diagnostic and medical device companies to share “pre-competitive”
data and a set of common tools for analysis of data. The license protects the
intellectual property of all stakeholders.
• Dr. Eric Perakslis, now CIO and Chief Scientist (Informatics) at the FDA, originally
developed tranSMART when he served as a research scientist at Johnson &
Johnson. tranSMART is based on the i2b2 informatics platform.
• tranSMART has been adopted more broadly in Europe than in the U.S. An example
of a study where “pre-competitive” data were shared (KM: Knowledge
Management):

U-BIOPRED
(Unbiased BIOmarkers in PREDiction
of respiratory disease outcomes)1
1Bel EH et al. Diagnosis and definition of severe refractory
asthma: an international consensus statement from the
Innovative Medicine Initiative (IMI). Thorax. 2011 66(10):910


One Mind Integrative Informatics Platform
Genome Proteome Signaling Phenome Disease

Integrative Analyses Managed Thru Cloud-Based Portal

One Mind
PortalTM
Builds off of
tranSMART
Data Knowledge
Management
System


Human Genome Variation as determined by NGS
“The ability of sequencing to detect a site that is segregating in the population is dominated by two
factors:
1. Whether the non-reference allele is present among the individuals chosen for sequencing, and;
2. The number of high quality and well mapped reads that overlap the variant site in individuals who
carry it.
Simple models show that for a given total amount of sequencing, the number of variants discovered is
maximized by sequencing many samples at low coverage. This is because high coverage of a few
genomes, while providing the highest sensitivity and accuracy in genotyping a single individual, involves
considerable redundancy and misses variation not represented by those samples.”1

Genome variants of different Transposons
types, determined by low coverage
sequencing of individuals, trios Duplications
(e.g., mother, father and daughter) and
exons. These data are derived from the 1000
Deletions Known
genomes project.1 Novel
Insertions
• Note that they did not attempt to resolve
Copy Number Variants (CNVs) or Variable SNPs
Number of Tandem Repeats
(VNTRs), which convey inter-individual
variation. 0% 50% 100%
• Note the large percentage
1Durbin et al. A map of human genomeof novel from population-scale sequencing. 2010. Nature 467: 1061-1073.
SNPs
variation
that were discovered by NGS.

Genome Variation and Pharmacogenomics
Some important points about Single Nucleotide Polymorphisms (SNPs) :
• All methods to determine human genome variation contain error.
• So-called “common” SNPs, with a frequency of >0. 5%, have yielded modest effects in genome-
wide association scans (GWAS) for determination in complex diseases.
• Early results from pharmacogenomic GWAS appear to indicate a greater ability to discover SNPs
with substantial effect size. Nevertheless, they do not explain the full extent of human genome
variation and drug response. Pharmacogenomic GWAS are limited in power by small cohort sizes.1
• Although each human genome may have ~3 M SNPs, only some of these variants are deleterious.
• SNPs have been the easiest genomic variant to measure, but other variants, such as Copy Number
Variants (CNVs), may be more important determinants of drug response.2
• Most variants that impact individual drug response have not yet been identified.3*
1Guessous, I., Gwinn, M. & Khoury, M.J. Genome-wide association studies in pharmacogenomics: untapped potential for
translation. Genome Med 1, 46 (2009); Group, S.C. et al. SLCO1B1 variants and statin-induced myopathy—a genome
wide study. N Engl J Med 359, 789-799 (2008). Sato, Y. et al. A new statistical screening approach for finding
pharmacokinetics related genes in genome-wide studies. Pharmacogenomics J 9, 137-146 (2009);
Crowley, J.J., Sullivan, P.F. & McLeod, H.L. Pharmacogenomic genome-wide association studies: lessons learned thus
far. Pharmacogenomics 10, 161-163 (2009).
2Rasmussen H B et al. Genome-wide identification of structural variants in genes encoding drug targets: possible

implications for individualized drug therapy. Pharmacogenetics and Genomics. July 2012. 22 (7): 471-483.
3Durbin et al. A map of human genome variation from population-scale sequencing. 2010. Nature 467: 1061-1073. *FDA.


Allele-Specific PCR cannot accurately detect SNPs1:

Unknown SNP

1Favis,
R. Applying next generation sequencing to
Unknown SNP pharmacogenomics studies in clinical trials.


High throughput genotyping platforms cannot accurately resolve
allelic variants of the CYP2D6 superfamily1:
Genome-wide arrays, some that are specifically configured to examine
pharmacogene variants, were poor at discriminating CYP2D6 alleles:

1Gamazon ER et al. The limits of genome-wide methods for pharmacogenomics testing. Pharmacogenetics and
Genomics. 2012. 22:261–272.;


Some important points about Next Generation Sequencing (NGS):
• All methods to determine human genome variation contain error.
• All ‘short read’ NGS methods rely on the use of a “reference genome” as ground truth, when the
various reference genomes have been shown to have unusual variation1.
• Short read NGS technology is fraught with errors, and thus either requires 60-100 fold coverage
for a single individual, or low coverage whole genome sequence data from a large popoulation2.
The most accurate results have been obtained from sequencing the whole genomes of closely-
related individuals, along with inclusion of other data related to family medical history1,3.
• Short read NGS technology is especially poor at calling variants in GC-rich regions of the genome
such as CpG islands.
• The real value is provided by long read technology, which has been implemented by Complete
Genomics, but they have a backlog of genomes to sequence under contract (~27,354 as of 6/12).
• So-called ‘clinical’ or bench-top sequencers, such as Illumina’s MiSeq or Life Technologies Ion
Torrent, manifest all the problems associated with short read technology, including extensive
pre-processing of tissue samples and complex data analysis.
1Dewey et al. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoS
Genet. 2011 September; 7(9): e1002280.
2Durbin et al. A map of human genome variation from population-scale sequencing. 2010. Nature 467: 1061-1073.
3Patel C J et al. Data-driven integration of epidemiological and toxicological data to select candidate interacting genes

and environmental factors in association with disease. Bioinformatics. 2012 Jun 15;28(12):i121-i126.


Whole genome sequencing & analysis has been able to resolve pharmacogene variation on a
genome-wide level, including the various alleles of the CYP2D6 superfamily1:
Allele Effect on Metabolism Allele Effect on Metabolism Allele Effect on Metabolism
*1 Fully functional *14 Null *33 Fully functional
*2 Fully functional *14A Null *35 Fully functional
*3 Null *14B Null *36 pseudogene
*4 Null *15 Null *37 Reduced activity
*5 Null *16 Null *38 Null
*6 Null *17 Reduced activity *39 pseudogene
*7 Null *18 Null *40 Null
*8 Null *19 Null *41 Reduced activity
*9 Reduced activity *20 Null *42 Null
*10 Reduced activity *25 pseudogene *43 pseudogene
*10AB Reduced activity *26 pseudogene *44 Null
*11 Reduced activity *29 Reduced activity *45 Reduced activity
*12 Null *30 pseudogene *46 Reduced activity
*13 Null *31 pseudogene *56 Reduced activity
1Black
JL et al. Frequency of undetected CYP2D6 hybrid genes in clinical samples: Impact on phenotype prediction. Drug
Metab Dispos June 2012 40:1238; Patents: United States Patent Application 20120088247;

Trends in Next Generation Sequencing
2010 2013
Generation 2nd Generation NGS 3rd Generation NGS
Fundamental technology SBS or degradation Direct physical inspection of the DNA molecule
using nanopore, high speed camera and/or silicon
chip technology
Resolution Averaged across many copies of the DNA Single-molecule resolution
molecule being sequenced
Raw read accuracy High, with >60-fold coverage High, missed variant calls: 1 in 500kb – 1M bases
Read length Short - ~35 bases, generally much shorter Long, 10,000 bp and longer
than Sanger sequencing
Throughput High Highest
Current cost Low cost per base Lowest cost per base
RNA-sequencing cDNA sequencing Direct RNA sequencing and cDNA sequencing
Start-to-Finish Days One hour per whole genome
Sample preparation Complex, library and PCR amplification Very simple
required
Data analysis Complex because of large data volumes and Complex because of large data volumes– however
because short reads complicate assembly and those can be solved by new high speed camera
alignment algorithms and chip technologies

Primary results Base calls with quality values Base calls with quality values, other base
information such as kinetics, structural variants
and phased haplotypes


Trends in Next Generation Sequencing
2nd Generation NGS - Short read archive:
• Hardware and Service Companies – Market Share– Ilumina and Complete
Genomics sequenced over 90% of all genomes as of 10/1/111
Percentage of Whole Human Genomes Sequenced

Illumina

Complete Genomics

Life Technologies

Others

• Concordance of variant calls – Illumina versus Complete Genomics short read1

Concordance between platforms: SNPs Indels
(One individual, 76-fold coverage, ~3.7M SNPs)
88.1% 26.5%
1Lam HL et al. Performance comparison of whole-genome sequencing platforms. Nature Biotech. 2012. 30: 78-82.


Next Generation Sequencing – Update 6/12
Company Product(s) Tech Problems Prognosis
• HiSeq 2nd generation - Too expensive; Will eventually be
• MiSeq clinical Short read Should have taken buyout acquired at bargain
sequencer* from Roche; Dominate market price, or merge – best
*(FDA-approved Type – believe the can also candidate for M&A is
III device) dominate MDx BGI.
Sequencing-as-a- 2nd generation Just laid off 55 employees – Long read technology is
service - Short read restructuring so as to only very accurate, but have
(75% of focus on clinical markets – no “over-committed”,
business); more life sciences research. including Mayo, ARUP,
3rd generation Need to switch to long read INOVA, Partners, etc.
(25%) technology ASAP – but can’t Will survive …
because of sequence backlog.
• Personal 2nd generation Tiny market share; already Company is diversified
genome - Short read pushed back dates on Ion enough to subsidize
• Exome Torrent Exome to 9/12 sequencing hardware
machine
• GridIron and 3rd generation No credibility; USB mini-pore Long read technology is
Mini-Ion – long read – can only sequence one accurate, Company has
licensed from genome in closed system – over $150M funding–
Winters-Hilt expensive. who knows?
Not named yet 3rd generation “Still working on the Long read technology is
– long read – chemistry”. CEO won’t discuss very accurate,
licensed from status of company… represents optimal
AssureRx Health, Inc.
Winters-Hilt solution – will survive.
CONFIDENTIAL 22

NGS – Complete Genomics, Inc.


NGS – Long Read Nanopore Solutions
Complete Genomics Their most recent technology involves
combining a very high speed CCD (charge-
coupled display) camera with each DNA
base tagged with a fluorochrome coming
through a nanopore.

•They have achieved 500Kb read
lengths, claim error rate is “I missed base
call variant every 500Kb” – Lee Hood.
•They have been able to resolve phased
maternal and paternal chromosomes
1. Extract and fragment DNA
•They can resolve distributed repeats (e.g.
2. Each base (A, C, G, T) tagged
pseudogenes)
with a different fluorochrome
3. Multi-planar graphene array •However, their in-house, pre- and post-
4. High-speed CCD camera – can processing steps are very complex and time-
consuming, their turnaround time for a
capture every base per pixel
human genome with a coverage of 10-fold is
with DNA traveling at ~10 base 72 days, and they now have a backlog of
pairs per second. 25,000 genomes.

NGS – Long Read Nanopore Solutions
Ideal System1 Rosenstein et al1 latest device can accurately
sequence 1 million base pairs of double-
stranded DNA without error.
• Unlike most researchers interested in
using nanopores to directly sequence
DNA that have slowed the DNA velocity in
the nanopore translocation stage through
adding an enzyme ratchet such as Oxford
Nanopore Technology to accommodate
the low bandwidths available, these
1. Extract DNA. researchers used complementary metal-
2. Pass “naked” DNA through oxide semiconductor (CMOS) processing
graphene nanopore array. and integrated circuits technology.
3. High bandwidth CMOS pre-amplifier • They have been able to redesign their
system to increase the bandwidth above
positioned under every pore. 50MHz, with a very low signal-to-noise
4. Solid state silicon nitride membrane ratio to sequence an entire human
chip mounted in the fluid cell. genome with very little sample
preparation in 20 minutes.
1RosensteinJK et al. Integrated nanopore sensing platform with sub-microsecond temporal resolution. Nature
Methods. 2012. 9 (5): 487-492.

WGA – Clinical Interpretation Software
Whole Genome Analysis - “The $1,000 genome and the $1M interpretation.”

3 major approaches:

• Filter data followed by complex analysis – Used by Cypher Genomics and Illumina

• Apply proprietary natural language processing algorithms against whole
genome or whole exome data – Used by Silicon Valley Biosystems

• Genomic best linear unbiased prediction (GBLUP) method to evaluate
predictive ability by cross-validation. GBLUP approaches take into account the
covariance structure inferred from the genomic data. Best predictive
accuracy1,2
1Ober Uet al. Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster.
PLoS Genetics. May 2012. 8 (5): 1-14.
2Jones B. Predicting phenotypes. Nature Reviews Genetics. 2012. 13. doi:10.1038/nrg3267


Whole Genome Analysis - Example from Cypher Genomics


Lab & Technology Operations

Lab
• Results delivered within one business day of
receipt of a patient’s DNA sample
• CLIA certified
• CAP accredited
• NY State Department of Health certified

Technology
• Advanced bioinformatics
• World-class data center operations
• Secure Internet protocols
• HIPAA compliant architecture
• Data integration with Facility Health Information
Management Systems


Next generation sequencing in pharmacogenomics

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (17)

Ähnlich wie Next generation sequencing in pharmacogenomics

Ähnlich wie Next generation sequencing in pharmacogenomics (20)

Mehr von Dr. Gerry Higgins

Mehr von Dr. Gerry Higgins (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Next generation sequencing in pharmacogenomics

Hinweis der Redaktion