Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15
1. Use of Bionetworks to Build Maps of Diseases
Stephen Friend MD PhD
Sage Bionetworks (Non-Profit Organization)
Seattle/ Beijing/ San Francisco
MIPS Seminar Series
August 15th, 2011
2. why consider the fourth paradigm- data intensive science
thinking beyond the narrative, beyond pathways
advantages of an open innovation compute space
it is more about how than what
3. Alzheimer’s Diabetes
Treating Symptoms v.s. Modifying Diseases
Cancer Obesity
Will it work for me?
8. WHY NOT USE
“DATA INTENSIVE” SCIENCE
TO BUILD BETTER DISEASE MAPS?
9. “Data Intensive Science”- “Fourth Scientific Paradigm”
For building: “Better Maps of Human Disease”
Equipment capable of generating
massive amounts of data
IT Interoperability
Open Information System
Evolving Models hosted in a
Compute Space- Knowledge Expert
10. It is now possible to carry out comprehensive
monitoring of many traits at the population level
Monitor disease and molecular traits in
populations
Putative causal gene
Disease trait
11. what will it take to understand disease?
DNA RNA PROTEIN (dark matter)
MOVING BEYOND ALTERED COMPONENT LISTS
13. How is genomic data used to understand biology?
RNA amplification
Tumors
Microarray hybirdization
Tumors
Gene Index
!Standard"GWAS Approaches Profiling Approaches
Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease
provides NO mechanism Many examples BUT what is cause and effect?
Provide unbiased view of
molecular physiology as it
relates to disease phenotypes
trait
Insights on mechanism
Provide causal relationships
and allows predictions
Integrated"
! Genetics Approaches
14. Integration of Genotypic, Gene Expression & Trait Data
Schadt et al. Nature Genetics 37: 710 (2005)
Millstein et al. BMC Genetics 10: 23 (2009)
Causal Inference
“Global Coherent Datasets”
• population based
• 100s-1000s individuals
Chen et al. Nature 452:429 (2008) Zhu et al. Cytogenet Genome Res. 105:363 (2004)
Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)
15. Constructing Co-expression Networks
Start with expression measures for genes most variant genes across 100s ++ samples
1 2 3 4 Note: NOT a gene
expression heatmap
1
1 0.8 0.2 -0.8
Establish a 2D correlation matrix 2
for all gene pairs
expression
0.8 1 0.1 -0.6
3
0.2 0.1 1 -0.1
4
-0.8 -0.6 -0.1 1
Brain sample
Correlation Matrix
Define Threshold
eg >0.6 for edge
1 2 4 3 1 2 3 4
1 1
1 4 1 1 1 0 1 1 0 1
2 2
1 1 1 0 1 1 0 1
1 1 1 0 Hierarchically 3
Identify modules 4 0 0 1 0
2 3 cluster
4
3 0 0 0 1 1 1 0 1
Network Module Clustered Connection Matrix Connection Matrix
sets of genes for which many
pairs interact (relative to the
total number of pairs in that
set)
16. Preliminary Probabalistic Models- Rosetta /Schadt
Networks facilitate direct identification of
genes that are causal for disease
Evolutionarily tolerated weak spots
Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2
17. List of Influential Papers in Network Modeling
50 network papers
http://sagebase.org/research/resources.php
19. Recognition that the benefits of bionetwork based molecular
models of diseases are powerful but that they require
significant resources
Appreciation that it will require decades of evolving
representations as real complexity emerges and needs to be
integrated with therapeutic interventions
20. Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a commons where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the
elimination of human disease
Building Disease Maps Data Repository
Commons Pilots Discovery Platform
Sagebase.org
22. Engaging Communities of Interest
NEW MAPS
Disease Map and Tool Users-
( Scientists, Industry, Foundations, Regulators...)
PLATFORM
Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)
RULES AND GOVERNANCE
Data Sharing Barrier Breakers-
(Patients Advocates, Governance
M
and Policy Makers, Funders...)
APS
FOR
M
NEW TOOLS
PLAT
NEW
Data Tool and Disease Map Generators-
(Global coherent data sets, Cytoscape,
RULES GOVERN Clinical Trialists, Industrial Trialists, CROs…)
PILOTS= PROJECTS FOR COMMONS
Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)
23. Platform Commons Research
Cancer
Neurological Disease
Metabolic Disease
Curation/Annotation
Building
Data Disease
Repository Maps
CTCAP
Public Data Pfizer
Merck Data Outposts Merck
TCGA/ICGC Federation Takeda
CCSB Astra Zeneca
CHDI
Commons Gates
NIH
Pilots
LSDF-WPP
Inspire2Live
Hosting Data POC
Hosting Tools Bayesian Models
Co-expression Models
Hosting Models
Discovery Tools &
Platform Methods
KDA/GSVA
LSDF
24. Example 1: Breast Cancer
Coexpression Networks
Module combination
Partition BN
Bayesian Network
Survival Analysis
25
Zhang B et al., manuscript
25. Generation of Co-expression & Bayesian Networks from
published Breast Cancer Studies
4 Public Breast Cancer Datasets
NKI: van de Vijver et al. A gene-expression
signature as a predictor of survival in breast
cancer. N Engl J Med. 2002 Dec 19;347
295 samples
(25):1999-2009.
Wang Y et al. Gene-expression profiles to
predict distant metastasis of lymph-node-
negative primary breast cancer. Lancet. 286 samples
2005 Feb 19-25;365(9460):671-9.
Miller: Pawitan Y et al. Gene expression
profiling spares early breast cancer patients
from adjuvant therapy: derived and 159 samples
validated in two population-based cohorts.
Breast Cancer Res. 2005;7(6):R953-64.
Christos: Sotiriou C et al.. Gene
expression profiling in breast cancer:
understanding the molecular basis of 189 samples
histologic grade to improve prognosis. J
Natl Cancer Inst. 2006 Feb 15;98(4):
262-72.
26. Recovery of EGFR and Her2 oncoproteins
downstream pathways by super modules
28. Key Driver Analysis
• Identify key regulators for a list of genes h and a network N
• Check the enrichment of h in the downstream of each node in N
• The nodes significantly enriched for h are the candidate drivers
29
29. A) Cell Cycle (blue) B) Chromatin modification (black)
C) Pre-mRNA Processing (brown) D) mRNA Processing (red)
Global driver
Global driver & RNAi
validation
30
31. Example 2. The Sage Non-Responder Project in Cancer
• To identify Non-Responders to approved drug regimens so
Purpose: we can improve outcomes, spare patients unnecessary
toxicities from treatments that have no benefit to them, and
reduce healthcare costs
Leadership: • Co-Chairs Stephen Friend, Todd Golub, Charles Sawyers &
Rich Schilsky
Initial • AML (at first relapse)
Studies: • Non-Small Cell Lung Cancer
• Ovarian Cancer (at first relapse)
• Breast Cancer
• Renal Cell
• Multiple Myeloma
Sage Bionetworks • Non-Responder Project
32. Bin Zhang
Model of Alzheimer’s Disease Jun Zhu
AD
normal
AD
normal
AD
normal
Cell
cycle
http://sage.fhcrc.org/downloads/downloads.php
33. Anders
New Type II Diabetes Disease Models Rosengren
Global expression data
340 genes in islet-specific
from 64 human islet donors
open chromatin regions
Blue module: 3000 genes
Associated with
Type 2 diabetes
Elevated HbA1c
Reduced insulin secretion
168 overlapping genes, which have
• Higher connectivity
• Markedly stronger association with
• Type 2 diabetes
• Elevated HbA1c
• Reduced insulin secretion
• Enrichment for beta-cell transcription
factors and exocytotic proteins
34. New Type II Diabetes Disease Models Anders
Rosengren
• Search across 1300 datasets in MetaGEO at Sage for similar expression profiles
Top hit: Islet dedifferentiation study where the 168 genes were upregulated in
mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)
• Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test
• Identification of candidate key genes affecting beta-cell differentiation and chromatin
Working hypothesis:
Normal beta-cell: open chromatin in islet-specific regions,
high expression of beta-cell transcription factors,
differentiated beta-cells and normal insulin secretion
Diabetic beta-cell: lower expression of beta-cell transcription
factors affecting the identified module, dedifferentiation,
reduced insulin secretion and hyperglycemia
Next steps: Validation of hypothesis and suggested key genes in human islets
35. Clinical Trial Comparator Arm
Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data
with Genomic Information from the Comparator Arms of Industry and
Foundation Sponsored Clinical Trials: Building a Site for Sharing
Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies,
clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance
[nonprofits].
Initiative to share existing trial data (molecular and clinical) from
non-proprietary comparator and placebo arms to create powerful
new tool for drug development.
36. Examples: The Sage Federation
• Founding Lab Groups
– Seattle- Sage Bionetworks
– New York- Columbia: Andrea Califano
– Palo Alto- Stanford: Atul Butte
– San Diego- UCSD: Trey Ideker
– San Francisco: UCSF/Sage: Eric Schadt
• Initial Projects
– Aging
– Diabetes
– Warburg
• Goals: Share all datasets, tools, models
Develop interoperability for human data
38. Federation s Genome-wide Network and
Modeling Approach
Califano group at Columbia Sage Bionetworks Butte group at Stanford
39. Human Aging Project
Data Transformations Machine Learning
Brain A
(n=363)
Interactome Elastic Net
Brain B
(n=145)
Brain C TF Activity Profile Age
(n=400) Network Prior Model
Models
Blood A
(n=~1000) Gene Set / Pathway
Variation Analysis
Blood B Tree Classifiers
(n=~1000)
Adipose
(n=~700)
41. Inferring Prostate Cancer Regulatory Modules for Glycolysis
&Glycogenesis Metabolism Pathway
Sage bionetworks approach
Prostate cancer global coherent
data set (GSE21032) Taylor BS. et al (2010) Cancer Cell 18(1):11-22
Integrated Bayesian Approach
Zhu J. et al (2008) Nature Genetics 40(7):
854-61
Glycolysis and Inferred Transcriptional
Glycogenesis Metablism Regulatory Network in Prostate
Gene Set (GGMSE) Cancer
Cox Proportional-Hazards
Prostate Cancer Regulatory Regression model based on
Modules for GGMSE and Other individual gene for recurrence free
Metabolism Pathways survival
Duarte N. et al (2006) PNAS 107(6):1777-1782
Metabolism pathways with regulatory
modules enriched by poor prognosis genes
for prostate cancer
42. Genes Associated with Poor Prognosis are disproportionally
found among the networks regulating the !glycolysis" Genes
P-Value<0.005 Size of the node proportional to -log10 P value for recurrence free survival.
Inferred regulatory module for GGMSE Inferred regulatory module for Oxidative
Phosphorylation and Sphingolipid
>5 fold enrichment of recurrence free prognostic genes with
Metabolism genes
the Glycolysis BN module than random selection (p<1e-100)
43. Federated Aging Project :
Combining analysis + narrative
=Sweave Vignette
Sage Lab
R code + PDF(plots + text + code snippets)
narrative
HTML
Data objects
Califano Lab Ideker Lab Submitted
Paper
Shared Data JIRA: Source code repository & wiki
Repository
44. Why not share clinical /genomic data and model building in the
ways currently used by the software industry
(power of tracking workflows and versioning
45. Synapse as a Github for building models of disease
61. Absurdity of Current R&D Ecosystem
• $200B per year in biomedical and drug discovery R&D
• Handful of new medicines approved each year
• Productivity in steady decline since 1950
• 90% of novel drugs entering clinical trials fail
• NIH and EU just started spending billions to duplicate process
• Significant pharma revenues going off patent in next 5 years
• >30,000 pharma employees fired in each of last four years
• Number of R&D sites in Europe down from 29 to 16 since 2009
62. What is the problem?
• Regulatory hurdles too high?
• Low hanging fruit picked?
• Payers unwilling to pay?
• Genome has not delivered?
• Valley of death?
• Companies not large enough to execute on strategy?
• Internal research costs too high?
• Clinical trials in developed countries too expensive?
In fact, all are true but none is the real problem
63. What is the problem?
• The current system is designed as if every new program is destined to
deliver an approved drug
• Past 20 years prove this assumption wrong (again and again)
• Why do promising early results rarely translate into approved drugs?
• Bottom line: we have poor understanding of biology
• Lack of early-data sharing within closed information systems dooms
drug discovery for frequent avoidable failure
64. What is the problem?
We need to rebuild the drug discovery process so that we
better understand disease biology before testing proprietary
compounds on sick patients
65. The solution – Arch2POCM
1. Create an Archipelago of clinicians and scientists from public
and private sectors to take projects from ideas to Proof of
Clinical Mechanism (POCM)
2. Arch2POCM is a collaborative, data-sharing network of
scientists, whose drug discovery objective is to use robust
compounds against new targets to disentangle the complexity
of human biology, not to create a medicine
3. Success?
• A compound that provides proof of concept for a novel target-
allowing companies to use this common information to compete,
with dramatic increased chances of success
• Culling targets with doomed mechanisms before multiple companies
waste money exploring them - at $50M a pop
66. Why data sharing through to Phase IIb?
• Most rapidly reveals limitations and opportunities associated with the
target
• Increases probability of success for internal proprietary programs
• Scientific decisions are not influenced by market considerations or
biased internal thinking
• Target mechanism is only properly tested at Phase IIb
67. Why no IP on “Common Stream” compounds?
• Allows multiple groups to test diverse indications without funds
from Arch2POCM- crowdsourcing drug discovery
• Broader and faster data dissemination
• Far fewer legal agreements to negotiate
• Generates “freedom to operate” on target because there are
no patent thickets to wade through
• Efficient way to access world’s top scientists and doctors
without hassle
69. First major milestones
2013- First Compound in clinical trials
2014- Go and No-Go Decisions from common stream of targets driving
Proprietary Programs
2014- Full complement of target programs activated
2014- Core Clinical Programs joined by crowdsourced clinical trials
70. why consider the fourth paradigm- data intensive science
thinking beyond the narrative, beyond pathways
advantages of an open innovation compute space
it is more about how than what
71. OPPORTUNITIES FOR MIPS COMMUNITY
Data sets, Tools and Models
Joining Synapse Communities
Joining Federation Projects
Joinig Arch2POCM
Change reward structures for sharing data
(patients and academics)