Statistical modeling in pharmaceutical research and development.
Friend Gastein 2012-10-04
1. A new community based vision of
open access innovation in
personalized medicine
Stephen H Friend MD PhD
President Sage Bionetworks
Non-Profit Organization
Seattle/ Amsterdam/ Beijing
Gastein Oct 4, 2012
3. NOT
MISSION IMPOSSIBLE
1. It’s going to be harder than you think but inevitable
2. Without deep citizen activation it will be unaffordable
3. Sharing data and models between researchers
especially between and within Universities
will need to fundamentally change
11. Extensive Publications now Substantiating Scientific Approach
Probabilistic Causal Bionetwork Models
•>80 Publications from Reseach
Metabolic "Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
Disease "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009)
….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
CVD "Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
Bone "Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
d
“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
Methods "An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)
"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.
12. List of Influential Papers in Network Modeling
50 network papers
http://sagebase.org/research/resources.php
14. Networked Approaches
BioMedicine Information Commons
Patients/
Data
Generators Citizens
CURATED
DATA
Data
TOOLS/ Analysts
METHODS
RAW
DATA
ANALYSES/
MODELS
Clinicians
SYNAPSE
Experimentalists
14
15. NOT
MISSION IMPOSSIBLE
1. It’s going to be harder than you think but inevitable
2. Without deep citizen activation it will be unaffordable
3. Sharing data and models between
researchers especially between and within Universities
will need to fundamentally change
16.
17.
18. We still consider much clinical research as if we were
“hunter gathers”- not sharing
.
21. Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a “commons” where integrative bionetworks are evolved by
contributor scientists and citizens
22. Networked Team Approaches 1
USABLE 2
DATA PRIVACY
BARRIERS
5 BioMedical Information Commons
EDUCATION Patients/
Data
BIOINFORMATICS Generators Citizens
CURATED
DATA
Data
TOOLS/ Analysts
METHODS
RAW
DATA
ANALYSES/
MODELS
4 Clinicians
REWARDS
3
FOR SYNAPSE
HOW TO Experimentalists
SHARING
DISTRIBUTE
TASKS 22
23. COMPONENTS NEEDED FOR NETWORKED APPROCHES TO
BUILDING EVOLVING MODELS OF DISEASE: RESEARCH 2.0
GEEKS AND SCIENTISTS
SANDBOX
PLACE TO BUILD MODELS
SYNAPSE OF DISEASE
24. Synapse Platform: a compute space for collaborative research
• Development of Robust, Reproducible,
and Reusable analytical methods
• Integration of Data, Tools and Methods
from across community
• Development of a Disease Model
Repository
• Forum for New Collaborations between
technically and geographically distinct
scientific groups
• Access to Cloud-Compute resources co-
located with large-scale data
synapse.sagebase.org 24
25. COMPONENTS NEEDED FOR NETWORKED APPROCHES TO
BUILDING EVOLVING MODELS OF DISEASE: RESEARCH 2.0
ALLOWS PATIENT TO REQUEST DATA BACK
PORTABLE
GIVES CONTROL OF DATA TO PATIENT
LEGAL WHO CAN THEN SAY I WANT TO SHARE IT
CONSENT
GEEKS AND SCIENTISTS
SANDBOX
SYNAPSE PLACE TO BUILD MODELS
OF DISEASE
26. Tool: PORTABLE LEGAL CONSENT US- approved
weconsent.us
John Wilbanks
• Online educational wizard
• Tutorial video
• Legal Informed Consent Document
• Profile registration
• Data upload 26
27. Open and Networked Approaches-
PRIVACY
BARRIERS
Regulatory issues and bottlenecks
Is data anonymized?
Yes- proceed No- Is data pseudonymized?
Yes- Is it “sensitive” data No
(health, genomic,..)
Yes
No- Will key to person’s ID
be shared with 3rd party? Consent is required
No- Proceed with
appropriate safeguards
for data access and
safekeeping
27
28. COMPONENTS NEEDED FOR NETWORKED APPROCHES TO
BUILDING EVOLVING MODELS OF DISEASE: RESEARCH 2.0
INCLUDING CITIZENS: DEMOCRATIZATION OF MEDICINE
GEEKS AND SCIENTISTS
SANDBOX
SYNAPSE
PLACE TO BUILD MODELS
OF DISEASE
PORTABLE ALLOWS PATIENT TO REQUEST DATA BACK
LEGAL GIVES CONTROL OF DATA TO PATIENT
CONSENT WHO CAN THEN SAY I WANT TO SHARE IT
ENGAGES CITIZENS AS PARTNERS
PATIENTS, RESEARCHERS, FUNDERS
BRIDGE
29. USE OF CO-OPETITIONS
The Sage/Bionetworks/DREAM Breast Cancer Prognosis Challenge
Building Better Models of Diseases Together
Goal: Assess the accuracy of computational models designed to predict
breast cancer survival based on clinical information about the patient's
tumor as well as genome-wide molecular profiling data including gene
expression and copy number profiles.
29
30. Sage-DREAM Breast Cancer Prognosis Challenge
one month of building better disease models together
Caldos/Aparicio
breast cancer data
154 participants; 27 countries
334 participants; >35 countries
Sep 26 Status
Challenge Launch: July 17
>500 models posted to Leaderboard
30
31. Targeted treatment and drug repositioning in type 2
diabetes using molecular disease signatures
Goal: identify pathophysiological subgroups of type 2 diabetes (T2D) to enable
specific treatment targeted to the cellular disease mechanisms.
Patient
Physician Researcher
31
Community based vision of open access innovation in personalized medicine
33. MELANOMA
Education is derived from
top-down experiential Best accuracy of clinical
knowledge diagnosis = 64%
(Grin, 1990)
160k new cases/year
48k deaths in 2012 in US
HPI
ABCDE
Both intra- and inter-
“ugly duckling” institutional data are siloed
MD Dermoscopy
Pathology
Molecular
?Photos
There is no standard
screening program for
skin lesions; seeing an
MD is self directed
34. MELANOMA
4.
give back risk-assessment &
education to the citizens
1.
activated citizens
take skin pictures
virtual cycle:
continuous aggregation
2.
store
of data
tons of data! enriching the model
3.
run algorithmic
challenges in the
compute space
35. The challenge of Open science
Regulatory issues and bottlenecks
• Cultural barriers
• Lack of leadership
• Privacy barriers
• Complex, country-specific
regulations try to codify
ethical principals
Common areas of Concern with Genomic Data
•Privacy
•Research Oversight
•Informed Consent
•Data Stewardship
35
36. Enabling Cooperative Discovery
Common Concerns with use
of genomic data
• Privacy
• Research Oversight
• Informed Consent
• Data Stewardship
Common Concerns with sharing
scientific data
• Being scooped
• Loss of funding
• Tenure denied
• Publication record
• Loss of potential profit
• Lack of recognition
• Loss of control 36
37. Consent must be a freely given, unambiguous and specific.
CONSENT Consent may involve clicking an icon, sending an email or
subscribing to a service. Consent can be withdrawn at anytime
(research exemption).
Potential Issues:
• Single study focus: Use of existing data is often difficult due to consent language either too vague or obsolete.
• Re-consenting isn’t always feasible: Use of archival data and/or specimen collected from deceased individuals
prior to genomics era.
• Consent conditional on guarantee of anonymity, privacy and confidentiality
Questions:
• Is the DNA data of a deceased 50 years old male, smoker, codename XY12ZS, identifiable data subject to consent
requirement?
• How can we ensure optimal use of data expected by participants?
• How will standard information notice and consent keep up with new technologies?
Potential Opportunities:
• Promote continuous interaction between subject
and researchers- educate
• Roll-out Portable Legal Consent within Europe 37
38. SAFEGUARDS Appropriate technical and organizational measures shall be
taken against unauthorized or unlawful processing of personal
data and against accidental loss or destruction of, or damage
to, personal data. Privacy by default and by design.
Data controller is liable and accountable for data processor
Synapse safeguards: Multiple solutions to address compliance
Potential Issues:
• Guaranteed anonymity and privacy is a myth: Unintentional misuse of the data, accidental data
breach or intentional violation of terms may still occur whether the data is handled electronically
or not.
• Enforcement challenges: Cannot police each activity from all users or assess the adequacy of
data protection by each user in a open collaborative space.
• Obtaining written contracts with each users is a bottleneck-
Questions:
• Shouldn’t we focus on education rather than on unrealistic guarantees of privacy?
• Will we introduce legislations that prevent discrimination based on personal data: Anti-
discrimination by default?
Anticipated actions:
• Engage fines, exclusion, public shame as possible
responses to breach or violations
39. TRANSFER Personal data shall not be transferred to a country or territory
outside the European Economic Area unless that country or
territory ensures an adequate level of protection for the rights and
freedoms of data subjects in relation to the processing of personal
data.
Issue:
• Web technology doesn’t tie to geographical boundaries
• US Safe-harbor stamp from US department of commerce for e-
commerce, not research.
• Restrict feasibility of international Challenge/modeling
competitions
• Incompatible with Cloud computing for BIG DATA analysis
Questions:
• Will we need to restrict EU data to EU servers and
• have them used by EU scientists only?
• Will we need to split international datasets?
Anticipated actions:
• Discuss possibility to certify non-EU data repositories for inclusion
and transfer of EU data.
• Let subjects determine where their data can be used 39
40. CLOUD Cloud providers must provide information on How, Where and by
Whom the data is being processed at all time.
Cloud customers should perform a risk assessment related to cloud
provider’s data protection practices.
Rules on data transfer remain.
Potential Issues:
• Based on single data user for single dataset
• Cloud providers will not accept to host sensitive data if they are liable for misuse of
the data by their customers or sub-processors
• Same resource for both data storage and data analysis
• Data location: EU data on EU-CLOUD. What about non-EU data?
• Roles and responsibilities for Synapse developer vs. Cloud provider and synapse users
Opportunities:
EU could develop certification for CLOUD providers with
respect to data protection
Cloud use and data transfer limitations
Should not be incompatible
40
41. E-privacy Users must be informed of use of cookies or
similar devices and be allowed to opt-out
Potential issues:
• The need for transparency and accountability of Synapse users implies renouncing to
privacy by design
– Synapse users must register
– Actions are tracked
– Access and Compliance Team can audit data use in Synapse
– Opting out is not allowed.
Anticipated actions:
Explain full transparency
existing in Synapse and other
Information Commons
and refuse access to users who opt-out 41
42. NOT IF WE WORK
TOGETHER
MISSION IMPOSSIBLE
1. It’s going to be much harder than you think - but inevitable
2. Without deep citizen activation it will be unaffordable
3. Sharing data and models between
researchers especially between and within Universities
will need to fundamentally change