Precision medicine in oncology aims to provide targeted cancer treatments based on a patient's individual tumor characteristics. The presentation discusses precision oncology initiatives including NCI-MATCH clinical trials which assign cancer therapies based on a tumor's molecular abnormalities rather than location. It outlines plans to expand genomically-based cancer trials, understand and overcome treatment resistance through molecular analysis, and establish a national cancer database integrating genomic and clinical data to accelerate cancer research. Cloud computing platforms are being developed to provide researchers access to large cancer genomic and clinical datasets. The goal is to advance precision cancer treatment by incorporating individual patient genetics and biomarkers into therapeutic decision making.
1. Precision Medicine in
Oncology – Cancer Informatics
October 20th, 2015
Warren Kibbe, PhD
NCI Center for Biomedical Informatics
2. 2
1. Precision Medicine
2. Pre-clinical models
3. TCGA / TCIA
4. Genomic Data Commons
5. Cloud Pilots
Slides are from many sources, but special thanks to
Drs. Harold Varmus, Doug Lowy, Jim Doroshow, Lou Staudt
5. 5
Definition of Precision Oncology
Interventions to prevent, diagnose, or treat cancer, based
on a molecular and/or mechanistic understanding of the
causes, pathogenesis, and/or pathology of the disease.
Where the individual characteristics of the patient are
sufficiently distinct, interventions can be concentrated on
those who will benefit, sparing expense and side effects
for those who will not.
Modified by D. Lowy, M.D., from IoM’s Toward Precision Medicine report, 2011
6. 6
Understanding Cancer
Precision medicine will lead to fundamental
understanding of the complex interplay between
genetics, epigenetics, nutrition, environment and clinical
presentation and direct effective, evidence-based
prevention and treatment.
7. 7
What is Cancer?
Cancer is a disease where cells ‘lose’ the normal controls that enable
them to participate as productive, stable members of tissues, organs
and organ systems. The process of developing cancer is usually
viewed as having three phases
Initiation, where some of the normal control processes define the cell
type, inter-cellular communication with other cells, and normal
responses to cell-cell signaling are altered
Proliferation, where cancerous cells ‘take over’ normal processes and
begin to proliferate
Metastasis, where the cancerous cell leave their normal location and
invade other tissues
8. 8
Cancer Statistics
In 2015 there will be an estimated
1,700,000 new cancer cases
and
600,000 cancer deaths
- American Cancer Society 2015
Cancer remains the second most common cause of death in the U.S.
- Centers for Disease Control and Prevention 2015
9. 9
Survivorship is on the rise
Today there are 14 million Americans alive with a history of cancer.
It is estimated that by 2024, the population of cancer survivors will
increase to almost 19 million: 9.3 million males and 9.6 million females
- American Cancer Society 2015
10. 10
The hope
We need to do what the HIV/AIDS research community did in the last
30 years, turn a certain death sentence into a managed, chronic
disease.
Like the HIV/AIDS story for the course of the HIV infection, we believe
that cancer has multiple pathways that are altered during the course of
the disease, and that targeted therapies hitting multiple involved
signaling pathways, metabolic pathways, receptors can block and
prevent disease progression
Unlike HIV, the course of cancer is more individualized and involves
different molecular actors depending on the cancer and the individual
11. 11
The frustration
We already know how to remove >50% of the cancer burden in the U.S.
Consistent nutrition, exercise & sleep – estimate(10-30%)
Tobacco control (smoking cessation) – ~25%
HPV and HCV immunization – ~20%
Behavior change is hard. Impacting the behavior of a broad population is
harder.
12. 12
Precision Medicine is not new: An early example
First disease for which molecular defect identified
Single substitution at position 6 of ß-globin chain
Abnormal Hb polymerization upon deoxygenation
“I believe medicine is just now entering into a new era when progress will be much more rapid than
before, when scientists will have discovered the molecular basis of diseases, and will have discovered
why molecules of certain drugs are effective in treatment, and others are not effective.”
Linus Pauling 1952 2015: still no good treatment
13. 13
The NCI Mission
& the PMI for
Oncology
• NCI Mission: To help people live
longer, healthier lives by supporting
research to reduce the incidence of
cancer and to improve the outlook for
patients who develop cancer
• Precision Medicine Initiative for
Oncology: Significantly expand our
efforts to improve cancer treatment
through genomics
14. 14
What Problems are We Trying to Solve?
Precision Medicine
Initiative in Oncology
• For most of its 70-year history,
systemic cancer treatment has relied
on therapies that are marginally more
toxic to malignant cells than to normal
tissues
• Molecular molecules that predict
benefit, response, or resistance in the
clinic have been lacking for many
cancers
15. 15
Proposed Solution to these Problems
Use genomics and other high
throughput technologies including
imaging to identify, create
predictive signatures, and
target molecular vulnerabilities
of individual cancers
Precision Medicine
Initiative in Oncology
17. 17
Precision Oncology
Trials Launched
2014:
MPACT
Lung MAP
ALCHEMIST
Exceptional Responders
2015:
NCI-MATCH
ALK Inhibitor
MET Inhibitor
NCI-MATCH: Features
[Molecular Analysis for Therapy Choice]
•Foundational treatment/discovery
trial; assigns therapy based on
molecular abnormalities, not site of
tumor origin for patients without
available standard therapy
• Regulatory umbrella for phase II
drugs/studies from > 20 companies;
single agents or combinations
•Available nationwide (2400 sites)
18. NCI MATCH
•Conduct across 2400 NCI-supported sites
•Pay for on-study and at progression biopsies
•Initial estimate: screen 3000 patients to complete
20 phase II trials
1CR, PR, SD, and PD as defined by RECIST
2Stable disease is assessed relative to tumor status at re-initiation of study agent
3Rebiopsy; if additional mutations, offer new targeted therapy
,2
19. 19
NCI-MATCH: Initial Ten Studies
Agents and targets below grey line are pending final regulatory review; economies of
scale—larger number of agents/genes, fewer overall patients to screen
Agent(s) Molecular Target(s)
Estimated
Prevalence
Crizotinib ALK Rearrangement (non-lung adenocarcinoma) 4%
Crizotinib ROS1 Translocations (non-lung adenocarcinoma) 5%
Dabrafenib and Trametinib BRAF V600E or V600K Mutations (non-melanoma) 7%
Trametinib BRAF Fusions, or Non-V600E, Non-V600K BRAF Mutations (non-
melanoma)
2.8%
Afatinib EGFR Activating Mutations (non-lung adenoca) 1 – 4%
Afatinib HER2 Activating Mutations (non-lung adenoca) 2 – 5%
AZD9291 EGFR T790M Mutations and Rare EGFR Activating Mutations (non-
lung adenocarcinoma)
1 – 2%
TDM1 HER2 Amplification (non breast cancer) 5%
VS6063 NF2 Loss 2%
Sunitnib cKIT Mutations (non GIST) 4%
≈ 35%
20. 20
MATCH Assay: Workflow for 10-12 Day Turnaround
Tissue Fixation
Path Review
Nucleic Acid
Extraction
Library/Template Prep
Sequencing , QC Checks
Clinical
Laboratory aMOI
Verification
Biopsy Received at Quality Control Center
1 DAY
1 DAY
1 DAY
1 DAY
3 DAYS
10-12 days
Tumor content >70%
Centralized Data Analysis
DNA/RNA yields >20 ng
Library yield >20 pM
Test fragments
Total read
Reads per BC
Coverage
NTC, Positive, Negative
Controls
aMOIs Identified
Rules Engine
Treatment Selection
3-5 DAYS
21. 21
The Components of the
Precision Medicine
Initiative for Oncology
• Dramatically expand the NCI-MATCH umbrella
to include new trials, new agents, new genes,
and new drug combinations
• Understand and overcome resistance to therapy
through molecular analysis and development of
new cancer models
• Increase genomics-based preclinical studies,
especially in the area of immunotherapy,
through the creation of patient-derived pre-
clinical models and non-invasive tumor profiling
• Establish the first national cancer database
integrating genomic information with clinical
response and outcome: to accelerate
understanding of cancer and improve its
treatment
22. 22
PMI-O: Expanding
Genomically-Based Cancer
Trials (FY16-FY20)
• Accelerate Launch of NCI-Pediatric MATCH
• Broaden the NCI-MATCH Umbrella:
Expand/add new Phase II trials to explore novel
clinical signals—mutation/disease context
Add new agents for new trials, and add new
genes to panel based on evolving evidence
Add combination targeted agent studies
Perform Whole Exome Sequencing, RNAseq,
and proteomic studies on quality-controlled
biopsy specimens—extent of research based on
resource availability
Add broader range of hematologic malignancies
• Perform randomized Phase II studies or hand-off to
NCTN where appropriate signals observed
• Apply genomics resources to define new predictive
markers in novel immunotherapy trials
• Expand approach to ‘exceptional responders’: focus
on mechanisms of response/resistance in pilot
23. 23
PMI-O: Understanding and
overcoming resistance to
therapy
(FY16-FY20)
• Create a repository of molecularly
analyzed samples of resistant disease
• Expand the use of tumor profiling
methods such as circulating tumor cells
(CTCs) and fragments of tumor DNA in
blood to understand and monitor disease
progression
• Develop new cancer models to identify
the heterogeneity of resistance
mechanisms
• Use preclinical modeling to determine the
effectiveness of new combinations of
novel molecularly targeted investigational
agents
24. 24
We need sophisticated
computational models to
understand patient
response, methods of
resistance, and to integrate
pre-clinical model data
Drivers for High Performance
Computing and
Computational Modeling
25. 25
Develop a Cancer
Knowledge System.
Establish a national
database that integrates
genomic information with
clinical response and
outcomes as a resource.
PMI-O: Informatics Goal
27. 27
Build multi-scale,
predictive computational
biology models for
understanding cancer
biology and informing
therapy. Develop detailed
cancer pathway models to
create targeted
combination therapies in
cancer. This approach has
transformed HIV therapy
and has the potential to do
the same in cancer
PMI-O: Informatics Goal
28. 28
The Cancer Genomic Data
Commons (GDC) is an existing
effort to standardize and
simplify submission of genomic
data to NCI and follow the
principles of FAIR -Findable,
Accessible, Interoperable,
Reusable. The GDC is part of
the NIH Big Data to Knowledge
(BD2K) initiative and an
example of the NIH Data
Commons
Genomic Data Commons
30. 30
Genomic Data Commons (GDC) – Rationale
TCGA and many other NCI funded cancer genomics projects each
currently have their own Data Coordinating Centers (DCCs)
BAM data and results stored in many different repositories; confusing
to users, inefficient, barrier to research
GDC will be a single repository for all NCI cancer genomics data
Will include new, upcoming NCI cancer genomics efforts
Store all data including BAMs
Harmonize the data as appropriate
Realignment to newest human genome standard
Recall all variants using a standard calling method
Define data sharing standards and common data elements
Will be the authoritative reference data set
Will need to scale to 200+ petabytes
31. 31
Genomic Data Commons (GDC)
First step towards development of a knowledge system for cancer
Foundation for a genomic precision medicine platform
Consolidate all genomic and clinical data from:
TCGA, TARGET, CGCI, Genomic NCTN trials, future projects
Project initiated Spring of 2014
Contract awarded to University of Chicago
PI: Dr. Robert Grossman
Go live date: Mid 2016
Not a commercial cloud
Data will be freely available for download subject to data access
requirements
32. 32
Integration with Imaging - TCIA
The Cancer Imaging Archive has imaging data and some imaging
series and cover a number of TCGA disease types
Breast invasive carcinoma (BRCA)
Glioblastoma (GBM), lower grade glioma (LGG)
Head&Neck squamous cell carcinoma (HNSC)
Kidney renal clear cell carcinoma (KIRC)
Ovarian serous cystadenocarcinoma (OV)
Integration through standardized RESTful APIs
Driven by the need to have better computational models
33. The NCI Cancer Genomics
Cloud Pilots
Understanding how to meet the research
community’s need to analyze large-scale cancer
genomic and clinical data
35. 35
NCI GDC and the Cloud Pilots
Working together to build common APIs
Working with the Global Alliance for Genomics and Health (GA4GH) to
define the next generation of secure, flexible, meaningful,
interoperable, lightweight interfaces
Competing on the implementation, collaborating on the interface
Aligned with BD2K and serving as a part of the NIH Commons and
working toward shared goals of FAIR (Findable, Accessible,
Interoperable, Reusable)
Exploring and defining sustainable precision medicine information
infrastructure
36. 36
Information problem(s) we intend to solve with the Precision
Medicine Initiative for Oncology
Establish a sustainable infrastructure for cancer genomic
data – through the GDC
Provide a data integration platform to allow multiple data
types, multi-scalar data, temporal data from cancer models
and patients
Under evaluation, but it is likely to include the GDC, TCIA,
Cloud Pilots, tools from the ITCR program, and activities
underway at the Global Alliance for Genomics and Health
Support precision medicine-focused clinical research
37. 37
NCI Precision Medicine Informatics Activities
As we receive additional funding for Precision Medicine,
we plan to:
Expand the GDC to handle additional data types
Include the learning from the Cloud Pilots into the GDC
Scale the GDC from 10PB to hundreds of petabytes
Include imaging by interoperating between the GDC and the
Quantitative Imaging Network TCIA repository
Expand clinical trials tooling from NCI-MATCH to NCI-MATCH Plus
Strengthen the ITCR grant program to explicitly include precision
medicine-relevant proposals
38. 38
Project Schedule and Deliverables
Selection
Design/
Build I
Design/Build II Evaluation
6 Months
Initial Design and
Development
9 Months
Completion of Design,
Development and
Implementation
9 Months
Provide cloud to
researchers
NCI evaluations
Community evaluations
Jan
14
Sep
14
Apr
15
Jan
16
Sep
16
39. 39
Three Cancer Genomics Cloud Pilots
• PI: Gad Getz
• Google Cloud
• Firehose in the cloud
• http://firecloud.org
Broad Institute
• PI: Ilya Shmulevich
• Google Cloud
• Interactive visualization and analysis
• http://cgc.systemsbiology.net/
Institute for
Systems
Biology
• PI: Deniz Kural
• Amazon Web Services
• > 30 public pipelines
• http://www.cancergenomicscloud.org
Seven Bridges
Genomics
40. FireCloud is modeled on Firehose,
the cancer genome analysis
platform built by the Getz lab at the
Broad Institute, which supports
both small groups and major
projects (e.g. TCGA, GTEx).
FireCloud significantly expands on
Firehose’s capabilities. Firehose is
used by both production managers
for large-scale analysis and analysts
for interactive analysis, curation
and manual review of data for
publication.
Free trial workspaces
are available
for all new users!
Firehose re-born in the cloud
FireCloud is a collaboration of the Broad
Institute, University of California at
Santa Cruz and University of California
at Berkeley.
This project has been funded by the
National Cancer Institute and
National Institutes of Health.
41. 41
Pre-loaded FireCloud workspaces
FireCloud will be populated with pre-loaded workspaces, which, when cloned,
will allow users to replicate analyses from curated and published works
A sampling of pre-loaded workspaces:
Curated TCGA Tumor Type Analysis Working Group workspaces
TCGA GDAC Analysis Working Group workspaces
TCGA PanCanAtlas analysis workspace
Tutorial workspaces containing paired tumor and normal cell lines
Benchmarking data to enable users to test developing tools and methods
Synthetic BAMs for testing contamination
Synthetic BAMs for testing mutation calling
A sampling of pipelines:
TCGA Production Analysis
PCAWG Pipeline
TCGA GDAC Pipeline
42. ISB Cancer Genomics Cloud @ isb-cgc.org
TCGA Data in the Cloud
Available Now§: over 400 GB of curated data
in open-access BigQuery tables
Before the end of 2015: over 1 PB of
DNAseq, RNAseq, and SNP6 controlled-
access* data in Google Cloud Storage
Interactive Exploration
Interactively explore all metadata and open-
access data
Customize, save, and share data visualizations
with colleagues
Define, export, and share custom cohorts – based
on:
clinical information,
molecular characteristics, and/or
data type availability
3rd party tools: IGV and NG-CHM
§ email info@isb-cgc.org for details
*requires dbGaP authorization,
with authentication via NIH Login
43. Programmatic Access backed by the Power of Google
Easy Access to the Data
ISB-CGC Cloud Endpoint APIs expose a REST interface for search
and retrieval, with responses returned as JSON objects
Google APIs provide direct access to BigQuery & Cloud Storage
RStudio and IPython tutorial examples avaiable on github
Use RStudio or IPython Notebooks to script your own analyses
Your own Google Cloud Platform Project
Start with $300 to spend on compute & storage – request additional
credits as needed
Invite your students, post-docs & collaborators to join your project
Upload your own data into Cloud Storage and BigQuery
Run your own algorithms and pipelines on Google VMs
Use and share methods via
Docker containers
Galaxy Workflows
Common Workflow Language
45. Finding and exploring TCGA data
Data browser enables complex
queries of 100+ metadata
properties (can also be queried
programmatically)
“Find data from cases with
(RNAseq data from BOTH
Primary tumor and Adjacent
Normal) AND (WGS on Blood
Derived Normal samples).”
47. 47
Cloud Pilots: Coming your way!
Broad
• Version 1.0 – 1/20/2016
• Version 1.1 – 4/2/2016
ISB
• Pre-release – 11/15/2015 (open-access data)
• Version 1.0 – 12/20/2016
• Version 2.0 – 3/20/2016
SBG
• Early access – 11/15/2015
• Version 1 – 12/28/2015
• Version 2 – 3/28/2016
48. 48
NCI-Sponsored Evaluation Activities
Independent Testing
& Evaluation
• Highly structured test of functionality, security, load capability
• Assessment of the key strengths, weaknesses, issues, and
risks
Administrative Supplements
• Active NCI grants: R01, R21, U01, U24, P30
• Proposals due October 18
DREAM Challenge
• Launching early 2016
• Use TCGA RNA-Seq data to identify somatic mutations
Hands-On Workshop
• May 24, 2016 @ NCI Shady Grove
• Half day for each Cloud Pilot
NCI Intramural Evaluation
• Dedicated resources to support intramural investigators’
research
49. CGC Pilot Team Principal Investigators
• Gad Getz, Ph.D - Broad Institute - http://firecloud.org
• Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/
• Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org
NCI Project Officer & CORs
• Anthony Kerlavage, Ph.D –Project Officer
• Juli Klemm, Ph.D – COR, Broad Institute
• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology
• Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics
GDC Principal Investigator
• Robert Grossman, Ph.D - University of Chicago
Cancer Genomics Project Teams
NCI Leadership Team
• Doug Lowy, M.D.
• Lou Staudt, M.D., Ph.D.
• Stephen Chanock, M.D.
• George Komatsoulis, Ph.D.
• Warren Kibbe, Ph.D.
Center for Cancer Genomics Partners
• JC Zenklusen, Ph.D.
• Daniela Gerhard, Ph.D.
• Zhining Wang, Ph.D.
• Liming Yang, Ph.D.
• Martin Ferguson, Ph.D.
50. 50
National Strategic Computing Initiative
Executive Order announced July 29, 2015
Create a cohesive, multi-agency strategic vision and Federal investment strategy
in high-performance computing (HPC)
Lead agencies
DOE, DoD and NSF
Deployment agencies
NASA, NIH, DHS, and NOAA
Participate in shaping future HPC systems to meet aims of respective missions and
support workforce development needs
Implications for NCI
Work cross agency with DOE and others to expand use of HPC to advance research
and clinical applications impacting cancer
50
51. 51
Possible NCI-DOE initiative aligned with NSCI
Three candidate pilot projects identified:
Pre-clinical Model Development and Therapeutic Evaluation
(Doroshow)
Improving Outcomes for RAS Related Cancers (McCormick)
Information Integration for Evidence-based Cancer Precision Medicine
(Penberthy)
Collaboratively developing project plans with DOE computational
scientists
51
52. 52
Pilot Project 1: Pre-clinical Models
Pre-clinical Model Development and Therapeutic
Evaluation
Scientific lead: Dr. James Doroshow
Key points:
Rapid evaluation of large arrays of small compounds for
impact on cancer
Deep understanding of cancer biology
Development of in silico models of biology and predictive
models capable of evaluating therapeutic potential of
billions of compounds
52
53. 53
Pilot Project 2: RAS Related Cancers
Improving Outcomes for RAS Related Cancers
Scientific lead: Dr. Frank McCormick
Key points:
Mutated RAS is found in nearly one-third of
cancers, yet remains untargeted with known drugs
Advanced multi-modality data integration is required for model
development
Simulation and predictive models for RAS related molecular
species and key interactions
Provide insight into potential drugs and assays
53
54. 54
Pilot Project 3: Evidence-based Precision Medicine
Information Integration for Evidence-based Cancer
Precision Medicine
Scientific lead: Dr. Lynne Penberthy
Key points:
Integrates population and citizen science into improving
understanding of cancer and patient response
Gather key population-wide data on treatment,
response and outcomes
Leverages existing SEER and tumor registry resources
Novel avenues for patient consent, data sharing and
participation
54
55. 55
Take homes
Genomic Data Sharing policy is an opportunity for standardized,
meaningful data exchange
Open APIs and RESTful interfaces enable ‘purpose focused’ data
sharing
Genomic Data Commons will be the repository for NCI-funded
genomic data
New architectures including discoverable identifiers, baked-in
provenance, ad hoc levels of versioning and micro-publication (e.g.
DOIs) enable very different incentives for data sharing
Computational models for prediction are critical to the future