tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study The Accelerated Cure Project MS Repository Dataset as a Case Study
Stephen Wicks, Rancho Biosciences
The Accelerated Cure Project for Multiple Sclerosis is a non-profit focused on accelerating research for a cure for MS. One of their major projects over the last decade has been the generation of the ACP Repository, a collection of biological samples and associated clinical data from approximately 3200 case or control participants. More than 75 studies are underway or have been completed, in both industry and academic settings, using samples from the ACP Repository. Rancho BioSciences has partnered with ACP through Orion Bionetworks to curate and load these datasets and associated clinical CRFs into tranSMART. In this talk, we will describe the rich ACP dataset and discuss our experiences in preparing the data for analysis in tranSMART
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study
1. The ACP
MS Repository
A Case Study
tranSMART Community Meeting, Nov. 2013
Stephen Wicks, Ph.D.
2. The ACP Repository: a Case Study
What is Multiple Sclerosis?
Chronic inflammatory/demyelination disorder affecting
the CNS. (about 0.1%)
Leading cause of neurological disability in young
adults.
Symptoms are variable and significant. They include
vision, cognition, locomotion, pain, disorientation,
dexterity, mood, bowel/bladder control, others.
Generally progressive, but progression is idiosyncratic.
(CISRRMSSPMS, vs. CISPPMS etc.)
Complex etiology
3. The ACP Repository: a Case Study
What is the cost of MS?
Difficult
and costly to diagnose (MRI, symptom
variability leads to extensive differential diagnosis)
Treatments can slow progression, but are expensive.
Many different drugs exist, but patient stratification
for maximum efficacy and minimum side effects is
non-existent. “Role the dice”
Often strikes early in life, and is a life-long disability.
Average Diagnosis at about 30. 5% before 16.
4. Orion Bionetworks
ACP
is a founding member of Orion.
Orion seeks to cure MS by harnessing the power of
computational modeling of disease progression.
ACP will provide its data to Orion in tranSMART to
facilitate this goal.
Rancho BioSciences will curate and harmonize the
ACP data for Orion
5. The ACP Repository: a Case Study
ACP and the MS Repository
Founded
in 2001 by an MIT entrepreneur with MS
ACP MS Repository started in 2006. The goal was
to identify the cause of MS.
ACP MS Repository enrollment shut down this
year. Approximately 3200 participants enrolled.
Biosamples, demographics, medical history etc.
Research data
OPT-UP
7. The ACP Repository: a Case Study
The ACP Engine
“Matchmaker”
Database Graphical
User Interface
MS Discovery Forum
Reviewing Developments in the MS Field
Communicating with MS Researchers
Allowing MS Researchers
Worldwide to Explore the
ACP Repository Database
ACP Repository
$13 million
Invested
3200+ participants
Biosamples & Datasets
77 sets of biosamples+data
(b,m)illions of datapoints,
From 36 studies, so far
MS Researchers
Worldwide
Academia & Industry
Insights and Results
Mechanisms
Diagnostics
Causes
Treatments
8. The ACP Repository: a Case Study
ACP MS Repository
Open-access collection of highly annotated bloodderived samples plus data from MS, related diseases, &
control subjects gathered from 2006-2013.
Requirement for research data derived from samples to
be with them (ACP) allowed us to obtain IP protection).
“Workingdeposited (with a provision forcritical samples and confirm our
results for only $20,000. If I had to obtain these samples from scratch, it would
Contributes to MS+ research in many ways:
have cost $1 million and added 5 years to the project.”
- Enables studies that might not be conducted School of
Thomas M. Aune, PhD, Molecular Biology, Vanderbilt University
Medicine otherwise (academic & commercial)
(from Scientific American)
Creates a common results database for studies from
multiple bio-analytical techniques on overlapping sets
of subjects.
Approximately 3200 participants.
10. The ACP Repository: a Case Study
ACP Case Report Form
48 Page (first visit) and 38 page (second visit)
complete clinical workup
Form completed with the assistance of a
clinical research associate over a several hour
interview (with sample draw and lab workup)
Broad data: 80 distinct tables in an SQL
database
Deep data: in flat data files, more than 20
million cells
11. The ACP Repository: a Case Study
CRF Sample Fields
Study drugs Drug”measurelethaly so) drug meaningless.
103 distinct textual responses.“CS-0777”, orordinal with betaseron, BETASERON, etc.
Illustrates some Nothe problems associated order was units dataset.
Inappropriate (sometimesdrugfrequency.
of consistent etc…
of trail enrollment
“First Drug”, “Second “Betseron”, beta-seron, curating this
“BG00012 (FUMARATE) OR PLACEBO”
12. The ACP Repository: a Case Study
DMD Curation Solutions
We
applied drug ontologies and mapping
vocabularies where needed.
We repaired and consolidated dose,
frequency, etc. to a single measure with 3
values (high, standard, low)
We re-formatted the data to eliminate the
ambiguous cardinal ordering of reporting
13. The ACP Repository: a Case Study
CRF Sample Fields
Multiple Drugs (Observations) were addressed with…
15. The ACP Repository: a Case Study
Controlled Vocabularies
(sports)
~5000
responses
779 distinct sports reported
When filtered by “ski”,29
“gym”, 45
“walk”, 30; “jog”, 17, “run”, 40
16. The ACP Repository: a Case Study
Controlled Vocabularies (sports)
All sports mapped to a 29 term vocabulary.
17. The ACP Repository: a Case Study
Controlled Vocabularies (pets)
~6500
pets reported
600 distinct pets reported
When filtered by “dog”, 112, however, this misses
mispellings (“diog”, “dot”, “pubs”, dog-like pets
“wolf”, “half-wolf”, “mutt”, and breeds (“poddle”,
“poodle”, “Afghan Hound”, etc.)
59 additional dog-like entries
18. The ACP Repository: a Case Study
Controlled Vocabularies (pets)
All pets mapped to a 31 category controlled vocabulary
19. The ACP Repository: a Case Study
Medication Curation Challenges
Amitriptaline
>10,000 medications listed.
Amitriptylin
Amitriptyline
2703 distinct medications listed.
Amitriptyline HCL
Mapped these to 614 real medications (e.g.
Amitroptyline
Amitriptyline)
Amitryetyline
This was split into twoAmitrypatiline
tables:
Amitryptailine
Continuing Medications (541 entities)
Amitryptaline
Stopped Medications (317 entities)
Amitryptilin
VISIT_NAME was used Amitryptiline distinct observations
to represent
across the whole study
Amitryptilline
Amitryptylene
Truly longitudinal measures were reified in the tree
hierarchy in the data Amitryptylinefile.
mapping
21. The ACP Repository: a Case Study
Date and Time Coding
All
Dates in multiple formats:
15/12/2001
15/Dec/2001
Dec-2001
2001
Dec./2001
12/2001
--/--/-----/2001
------------/12/2001
dates converted to
periods (Months, Years,
or Days) prior to the
relevant blood draw
date.
Dates were represented
by International
Standard ISO 8601. i.e.
YYYY-MM-DD (e.g. 200112-15)
22. The ACP Repository: a Case Study
Repository Usage
77 studies ongoing or completed
36 studies have returned data to ACP
Data types:
Low-D biomarker (antibodies, metabolites, serum
markers of inflammation, etc.)
Low-D genotype data
High-D SNP/GWAS data
Gene-expression studies
Whole-genome sequencing (2 distinct studies)
Study types:
Etiology
Diagnostics
Disease activity biomarkers
24. The ACP Repository: a Case Study
Research Data Curation Challenges
Few
guidelines provided to researchers for data
formatting or treatment
Often little or no documentation describing how
the data was generated or handled (raw vs.
normalized, transformations e.g.)
Load study meta-data (contact info, description,
etc. at the node level)
25. The ACP Repository: a Case Study
Sample Study Results
Biogen gene expression study:
Designed to identify gene-expression
profiles that discriminate progressive
forms of MS from relapsing-remitting
forms of the disease.
26. The ACP Repository: a Case Study
Future Directions
Rancho BioSciences is providing guidance to ACP
for data-collection practices going forward (e.g.
OPT-UP)
We loaded the clinical data and 6 sample study
datasets into an Oracle-based tranSMART instance
that we host in-house for QC purposes.
The full dataset is slated to be loaded into a 1.1
postgreSQL-based tranSMART instance (hosted by
Recombinant by Deloitte for Orion).
This and other data sources (Inst. For Neuroscience
at B&W) will be analyzed and modeled by Orion
Ethos of ACP = removing barriers to research. The first challenge was to create a new type of infrastructure for research to get conducted within. They say the journey of a thousand miles begins with a single step – for us the beginning of the journey was with a single study.