Bandra East [ best call girls in Mumbai Get 50% Off On VIP Escorts Service 90...
ISCB ECCB BD2K keynote Kibbe 201707
1. Data Commons in the age of
Precision Medicine
Warren Kibbe, PhD
warren.kibbe@nih.gov
@wakibbe
July 23rd, 2017
2. Outline of
Presentation
NCI Background
Cancer Moonshot
Open Science
NCI MATCH
Proteomics, Genomics,
Imaging, Clinical Phenotype
NCI Data Commons
3. Personal &
Professional
Background
PhD in chemistry,
Caltech, Postdoc in
molecular genetics of
RAS
Cancer research for 20+
years
Cancer informatics,
software, healthcare
Director NCI CBIIT since
2013; Acting NCI Deputy
Director since 2016
Lost three grandparents
to cancer
5. 5
To develop the scientific
knowledge base that will
lessen the burden of
cancer in the United States
and around the world.
NCI Mission
6. 6
In 2016 there were an estimated
1,700,000 new cancer cases
and
600,000 cancer deaths
- American Cancer Society
Cancer remains the second most common cause of
death in the U.S.
- Centers for Disease Control and Prevention
7. 7
In 2016 there were an estimated
15,500,000
cancer survivors in the U. S.
8. 8
Understanding Cancer
Precision medicine will lead to fundamental
understanding of the complex interplay between
genetics, epigenetics, nutrition, environment and clinical
presentation and direct effective, evidence-based
prevention and treatment.
9. 9
The Beau Biden Cancer Moonshot
• Accelerate progress in cancer,
including prevention & screening
• From cutting edge basic research to
wider uptake of standard of care
• Encourage greater cooperation
and collaboration
• Within and between academia,
government, and private sector
• Enhance data sharing
10. Cancer Moonshot Data & Technology Team
Co-Chairs: Dimitri Kusnezov (DOE), DJ Patil (OSTP), and Jerry Lee (OVP)
Members:
• John Scott (DoD)
• Craig Shriver (DoD)
• Cheryll Thomas (CDC)
• Frances Babcock (CDC)
• Teeb Al-Samarrai (DOE)
• Sean Khozin (FDA)
• Alexandra Pelletier (PIF)
• Maya Mechenbier (OMB)
• Henry Rodriguez (NCI)
• Karen Cone (NSF)
• Michael Kelley (VA)
• Louis Fiore (VA)
• Warren Kibbe (NCI)
• Betsy Hsu (NCI)
• Niall Brennan (CMS)
• Thomas Beach (USPTO)
• Claudia Williams (OSTP)
• Vikrum Aiyer (USPTO)
• Tom Kalil (OSTP)
• Kathy Hudson (NIH)
• Dina Paltoo (NIH)
• Al Bonnema (DoD)
• Michael Balint (PIF)
• Kara DeFrias (OVP)
• Greg Pappas (FDA)
• Erin Szulman (OSTP)
• Paula Jacobs (NCI)
11. 11
• 28 Members
• Clinicians, researchers, advocates, pharma and tech industries
• Three face-to-face meetings to identify “Moonshot”
recommendations
• 7 Working Groups
• Clinical trials, enhanced data sharing, cancer immunology, tumor
evolution, implementation science, pediatric cancer, precision
prevention and early detection
• Met weekly for 6 weeks to generate 2-3 recommendations/working
group
• More than 150 people were part of the working group
Blue Ribbon Panel: Members & Working Groups
12. Co-Chairs
Tyler Jacks* Elizabeth Jaffee* Dinah Singer
MIT Johns Hopkins NCI
Peter C. Adamson*
Children's Hospital of Philadelphia
James Allison
MD Anderson
David Arons
National Brain Tumor Society
Mary Beckerle
Univ. of Utah
Mitchel Berger*
UCSF
Jeffrey Bluestone
Parker Institute
Chi Dang*
U. Penn
Mikael Dolsten
Pfizer
*NCAB/BSA member
Augusto Ochoa
Louisiana State Univ.
Jennifer Pietenpol
Vanderbilt Univ.
Angel Pizzaro
Amazon Web Services
Barbara Rimer
UNC
Charles Sawyers*
MSK
Ellen Sigal
Friends of Cancer Research
Patrick Soon-Shiong
NantWorks
Wai-Kwan Alfred Yung
MD Anderson
James Downing
St. Jude Hospital
Levi Garraway
Harvard Medical School
Gad Getz
Broad Institute
Laurie Glimcher
Weill Cornell
Lifang Hou
Northwestern
Neal Kassell
Univ. Va.
Elena Martinez*
UCSD
Deborah Mayer
UNC
Edith Mitchell
Thomas Jefferson Univ.
Blue Ribbon Panel
13. Working Group Co-Chair NCI Staff
Cancer Immunology Liz Jaffee,
Jim Allison
Toby Hecht,
Kevin Howcroft
Precision Prevention and Early
Detection
Mary Bekerle,
Jennifer Pietenpol
Elisa Woodhouse
Tracy Lively
Tumor Evolution Chi Dang,
Levi Garraway
Joanna Watson, Suresh Mohla, Tony
Dickherber
Clinical Trials Charles Sawyers,
Mitch Berger
Jeff Hildesheim
Meg Mooney
Implementation Sciences Elena Martinez,
Augusto Ochoa
Bob Croyle, Worta McCaskill-Stevens, Jennifer
Couch
Pediatric Cancer Peter Adamson
Jim Downing
Judy Mietz
Malcolm Smith
Enhanced Data Sharing Angel Pizarro
Gaddy Getz
Juli Klemm
Betsy Hsu
BRP Working Groups
16. Vision:
Enable the creation of a Learning Healthcare System
for Cancer, where as a nation we learn from the
contributed knowledge and experience of every
cancer patient. As part of the Cancer Moonshot, we
want to unleash the power of data to enhance,
improve, and inform the journey of every cancer patient
from the point of diagnosis through survivorship.
17. How do we solve problems in Cancer??
Support and incentives for team science, collaboration
We need FAIR, open data
Support open source, open science
Support for rapid innovation
18. 18
Data Sharing and the FAIR Principles
FAIR –
Making data
Findable,
Accessible,
Attributable,
Interoperable,
Reusable,
and provide Recognition
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
19. 19
(10,000+ patient tumors and increasing)
Courtesy of P. Kuhn (USC)
2006-2015:
A Decade of Illuminating the
Underlying Causes of Primary
Untreated Tumors Omics
Characterization
Cancer is a grand challenge
Deep biological understanding
Advances in scientific methods
Advances in instrumentation
Advances in technology
Data and computation
Mathematical models
Cancer Research and Care generate
detailed data that is critical to
create a learning health system for cancer
Requires:
21. http://cancerimagingarchive.net
• 33,000 total subjects
in the archive
• 67 data sets currently
available
• 21 from The Cancer
Genome Atlas project
• 10 from the Quantitative
Imaging Network
• Clinical trial data from
ECOG-ACRIN and
RTOG
22. Cancer
Genomics
Several distinct molecular
forms of cancer at each
organ site
The genomic
abnormalities of each
cancer are unique
The same molecular
abnormalities are found
in cancers that arise in
different organs
23. 23
18
Application of Cancer Genomics is changing
https://www.cancer.gov/about-cancer/treatment/clinical-trials/nci-supported/nci-match
25. 25Revised 05/30/2017
NCI-MATCH Success
• In June, the trial reached its goal to
sequence the tumors of 6k patients,
nearly two years early
• Its availability through more than 1100
participating sites reflects the broad
interest in the promise of genomics, and
the ability of such a study to deliver that
promise to the community
26. 26Revised 05/30/2017
NCI-MATCH Important Discovery
In patients tested so far, the
tumor gene variants we are
studying are less common
than expected, from 3.47
percent to zero
27. 27Revised 05/30/2017
NCI-MATCH Important Discovery (cont’d)
• Prevalence rates for many tumor gene abnormalities are
lower than expected – for several of the treatment arms to
reach their 35-patient goal, tens of thousands of patients
need to be screened
28. 28
MATCH and Precision Oncology
It isn’t just about matching patients to therapy, it
is also about avoiding therapies that will not
work.
Biology is complex, and we still have a lot of
basic biology to understand
Genomics+imaging+clinical labs is the first wave
of precision oncology
32. NCI GENOMIC DATA COMMONS
LAUNCHED AT ASCO ON JUNE 6, 2016
https://gdc-portal.nci.nih.gov
2.6 PB of legacy data and 1.5 PB of harmonized data.43
33. 33
The Cancer Genomic Data Commons
(GDC) is an existing effort to standardize
and simplify submission of genomic data
to NCI and follow the principles of FAIR
– Findable, Accessible, Attributable,
Interoperable, Reusable, and Provide
Recognition.
The GDC is part of the NIH Big Data to
Knowledge (BD2K) initiative and an
example of the NIH Commons
Genomic Data Commons
Microattribution, nanopublications, tracking the use of
data, annotation of data, use of algorithms, supports
the data /software /metadata life cycle to provide
credit and analyze impact of data, software, analytics,
algorithm, curation and knowledge sharing
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
34. GDC Acknowledgements
NCI Center for Cancer Genomics Univ. of Chicago
Bob Grossman
Allison Heath
Mike Ford
Zhenyu Zhang
Ontario Institute for Cancer Research
Lou Staudt
Zhining Wang
Martin Ferguson
JC Zenklusen
Daniela Gerhard
Deb Steverson
Vincent Ferretti
'Francois Gerthoffert
JunJun Zhang
Leidos Biomedical Research
Mark Jensen
Sharon Gaheen
Himanso Sahni
NCI NCI CBIIT
Tony Kerlavage
Tanya Davidsen
35. AT THE JUNE 29TH CANCER MOONSHOT SUMMIT, FOUNDATION
MEDICINE ANNOUNCED THE RELEASE OF 18,000 GENOMIC
PROFILES TO THE NCI GDC
36. • MMRF is the first non-profit organization to
upload information to the GDC
• Among its contributions will be data from relating
Clinical Outcomes in MM to Personal Assessment
of Genetic Profile (CoMMpass) study which began
in 2011 and has thus far enrolled over 1,150
patients
• Over the next eight years, patients in CoMMpass
will get a repeat biopsy and a new genomic
analysis at each six-month checkup and/or at
disease progression
• Tumor samples are being collected and analyzed
when possible at the time of any relapse. New data
will be deposited every six months at a
minimum
38. 38
NCI Cancer Genomics Cloud Pilots
Democratize access to
NCI-generated genomic
and related data, and to
create a cost-effective
way to provide scalable
computational capacity
to the cancer research
community.
Cloud Pilots provide:
• Access to large genomic data sets without need to download
• Access to popular pipelines and visualization tools
• Ability for researchers to bring their own tools and pipelines to the data
• Ability for researchers to bring their own data and analyze in combination with existing genomic
data
• Workspaces, for researchers to save and share their data and results of analyses
39. 39
• PI: Gad Getz
• Google Cloud
• Firehose in the cloud including Broad best practices workflows
•http://firecloud.org
Broad Institute
• PI: Ilya Shmulevich
• Google Cloud
• Leverage Google infrastructure; Novel query and visualization
•http://cgc.systemsbiology.net/
Institute for
Systems Biology
• PI: Deniz Kural
• Amazon Web Services
• Interactive data exploration; > 30 public pipelines
•http://www.cancergenomicscloud.org
Seven Bridges
Genomics
Three NCI Genomics Cloud Pilots
Selection
Design/Build
I
Design/Build
II
Evaluation Extension
Sept 2016Jan 2016April 2015Sept 2014
Jan 2014
40. Workspace –
isolated environment for collaborative analysis
Data + Methods → Results
sample data and
metadata (e.g.
BAMs, tissue type)
algorithms
(e.g. mutation
calling)
Wiring logic
(e.g. use the exome
capture BAM)
executions and results
(e.g. run mutation caller v41
on this exact bam and track
results)
Slide courtesy of Broad Institute
41. CGC Pilot Team Principal Investigators
• Gad Getz, Ph.D - Broad Institute - http://firecloud.org
• Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/
• Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org
NCI Project Officer & CORs
• Anthony Kerlavage, Ph.D –Project Officer
• Juli Klemm, Ph.D – COR, Broad Institute
• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology
• Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics
GDC Principal Investigator
• Robert Grossman, Ph.D - University of Chicago
• Allison Heath, Ph.D - University of Chicago
• Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research
Cancer Genomics Cloud Project Teams
NCI Leadership Team
• Doug Lowy, M.D.
• Lou Staudt, M.D., Ph.D.
• Stephen Chanock, M.D.
• George Komatsoulis, Ph.D.
• Warren Kibbe, Ph.D.
Center for Cancer Genomics Partners
• JC Zenklusen, Ph.D.
• Daniela Gerhard, Ph.D.
• Zhining Wang, Ph.D.
• Liming Yang, Ph.D.
• Martin Ferguson, Ph.D.
42. The NCI Cancer Research Data Commons
A virtual, expandable infrastructure
Standardized data submission and
Q/C
Controlled vocabularies
Harmonization by subject matter
experts
Genomic Data
Proteomic Data
GDC
Clinical
Functional
Cancer Models
Imaging
Population
Proteomics
NCI Cancer Research
Data Commons
GDC
Imaging Data
Data Contributors
Biologists / Clinical
Researchers
Clinicians and Patients
Tool /
Algorithm
Developers
Computational
Scientists
Authentication
&
Authorization
43. Node A
Cloud X
NCI Cancer Research Data Commons:
An Individual Node
Cloud Y
Data
Contributors
Data
Submission
Data
Mirroring
44. Improve understanding of the effectiveness of cancer
treatment in the “real world” through automation
44
SEER Precision Cancer
Surveillance
Surveillance data captured/ planned on each cancer patient for the entire population
Pathology
Molecular
Characterization
Detailed Initial
Treatment
Detailed
Subsequent
Treatment
Survival
Cause of Death
Progression
Recurrence
Complement trials to support
development of new diagnostics
and treatments
Understand treatment and
improve outcomes in the
“real world”
Genome
Demographics
48. 48
Biology and Medicine are now data
intensive enterprises
Scale is rapidly changing
Technology, data, computing and IT are
pervasive in the lab, the clinic, in the
home, and across the population
52. 52
Expert Systems vs Machine Learning
In 1945, the British philosopher Gilbert Ryle
identified two kinds of knowledge— factual,
propositional knowledge that can be ordered into
rules—“knowing that.” versus implicit,
experiential, skill-based—“knowing how.”
Machine Learning is based on ‘learning how’.
Expert systems, or rule based machines, are
based on ‘knowing that’.
53. 53
Human Cognition
Three kinds of learning:
Learning that – rule-based knowledge
Learning how – experiential knowledge
Learning why – integrative, explanatory knowledge
57. 57
DATA SHARING PLEDGE …
“leading research centers that have pledged to
make genomic & proteomic datasets available to
the public to advance cancer care”
10MOUs / 11 countries /
18institutions
ICPC (International Cancer Proteogenome Consortium)
58. 58
Integrated data sets, interoperable
resources, harmonized data are
necessary for and enable
biologically informed cancer
computational predictive models
61. 61
NIH Genomic Data Sharing Policy
https://gds.nih.gov/
Went into effect January 25, 2015
NCI guidance:
http://www.cancer.gov/grants-training/grants-
management/nci-policies/genomic-data
Requires public sharing of genomic data sets