In the tranSMART 17.1 development project The Hyve is enriching tranSMART with support for time series, samples and cross-study concepts. The project is lead by the tranSMART Foundation and supported by Sanofi, Pfizer, Roche and AbbVie. Read more at http://thehyve.nl/transmart-17-1-time-series-samples-cross-study-concepts-and-more/.
As presented on the tranSMART Foundation Annual Meeting 2016: https://youtu.be/k1eKMhXbqOA?t=5h6m57s
4. 4
What are we solving here?
1. Missing crucial functionality
â Time series, Samples, Cross-study concepts
â Transcript-level RNA-Seq data
â Large file storage
2. Code problems: âtechnical debtâ
â Monolithic architecture
â Lack of automated tests
â Old version Grails/Java, many code repositories
â No documentation of database
5. 5
Backend only
â Stable, commercial grade core
â Decoupling of the backend from the transmartApp User
Interface via the REST API
â Towards an ecosystem of User interfaces on top of the
tranSMART backend
â Why only the backend?
â transmartApp has issues: assumptions, old layered code
â Enough work already
â Current data will still work in transmartApp
â 17.1 project will be part of the full 17.1 release
7. Time series, samples and
cross study concepts
i2b2 database alignment and extension
8. 8
History
â tranSMART was developed on top of i2b2
to combine clinical with omics data
â i2b2 has cross-study concepts (with ontology codes) and
support for storing samples and time series data
â tranSMART lost this:
â Concepts are study specific
â User Interface assumes a patient-concept pair to have
one value
âPatient John has for concept heart rate the value 80 bpmâ
âConcept age in study A is not the same as
concept age in study Bâ
9. 9
Time series
â Absolute time
â Blood measurement with start (and end) date+time
â Hospital visit per patient grouping multiple measurements
with start (and end) date+time
â Relative time
â âBaselineâ (0 days) or âWeek 1â (7 days) observation
â Shared between patients
â Ordinal time
â First, second and third observation
10. 10
Samples
â Differentiated by âmodifiersâ
â Tumor and normal measurement
â Multiple doses
â Multiple tissues
â Differentiated only by a number âinstance_numâ
â Multiple replicas
11. 11
Cross-study concepts
â We want âAgeâ in different studies to be the same
concept
â Get subjects which match âAge > 50â from ALL studies
â Use ontology codes, eg. from an external ontology server
â Difference with i2b2: tranSMART is study based
â Study based data loading
â Study based data access
â We need to support both
13. 13
Time series and samples - Example 1
A study with tumor and normal
samples
â Multiple observations for the same patient
differentiated by the modifier âtissue typeâ.
â The Start_Date (and End_Date) for the
observation will be empty.
â All observations will be linked to the same
trial_visit, which will link to the study.
14. Clinical trial with multiple timepoints
(Baseline, Week 1, Week 2)
â Multiple observations for the same patient
differentiated by their trial_visit.
â All observations will be linked to one of the
available trial_visits, which will link to the
study. Each trial_visit has a Label
(Baseline), a Unit (Days) and a Value (0, 7
and 14).
14
Time series and samples - Example 2
15. 15
Time series and samples - Example 3 (1/2)
An EHR dataset with observation and
visit timestamps and samples.
â Multiple observations for the same patient
differentiated by their observation
Start_Date, visit and Instance_Num.
â The Start_Date (and End_Date) for the
observation will be set to a timestamp.
â The Instance_Num will be set starting
from 1 for multiple samples on the same
observation Start_Date and visit.
16. 16
Time series and samples - Example 3 (2/2)
An EHR dataset with observation and
visit timestamps and samples.
â The observations from a patient will be
linked one visit per hospital visit.
â The Start_Date (and End_Date) for the
visit will be set to a timestamp including
time and date for the hospital visit.
â All observations will be linked to the same
trial_visit, which will link to the study.
17. â Querying observations based on a combination of:
â start time, end time
â aggregated time series/samples values:
â minimum, maximum, average
â temporal constraints on sets of events:
â define sets of events (e.g. A: all blood pressure readings for a
patient, B: the first use of drug X by the patient)
â Specify constraints (e.g. All of A happen at least one week after
any of B).
17
Querying time series and samples
18. 18
â Querying patients based on observations:
â Certain constraints are valid for any or for all observations
for the patient
â Return patients where all observations of high blood
pressure occur after supply of drug X.
â Querying for aggregated values for numerical data:
â minimum, maximum, average
Querying time series and samples
20. 20
TranSMART data types
â Metadata
â Study, concept, patient metadata / Links to source data
â Clinical / NHTMP / Derived imaging data / Biobanking data
â numerical and categorical
â Gene expression - RNA
â Micro array
â mRNAseq, miRNAseq - only linked to genes
â qPCR miRNA
â Copy Number Variation data (Array CGH)
â Small Genomic Variants (SNP, indel â VCF format)
â Large genomic rearrangements
â Proteomics
â Protein mass spectrometry â peptide or protein quantities
â Immunoassay Rule-based medicine (RBM) â analyte concentrations
â Metabolomics
â Metabolite quantities
21. 21
Transcript-level RNA-Seq data
â Adding a data type where measurements
(readcount, normalised readcount and z-score)
are linked to transcripts instead of genes
â Dictionary will link genes to transcript for searching
REF_ID GPL_ID CHROMOSOME START_BP END_BP TRANSCRIPT
ENST0001 RNASEQ_TRANSCRIPT_ANNOT X 1000 1100 TR1
2 RNASEQ_TRANSCRIPT_ANNOT Y 2000 2500
3 RNASEQ_TRANSCRIPT_ANNOT 10 3000 4000 TR2
23. 23
Linking with Arvados: Scalable Genomics
â Linking files in Arvados to studies in tranSMART
for the storage of large files (eg BAM, VCF)
â If possible:
â Align with linking files in MongoDB to studies
â Eventual UI goals:
â See in tranSMART which Arvados files linked to study
â Start from tranSMART a Arvados workflow on Arvados
files
25. 25
Upgrade path / data migration
â If you have your data in 16.1 or 16.2
â There will be a data migration path provided to 17.1
26. 26
Backwards compatibility
If you have your data in 16.1 or 16.2
â The current user interface (transmartApp) will still work on current
data
â So only for data without time series, samples,
â Plugins are not guaranteed to work (but might very well)
â The current REST clients will still work with the V1 version of the
REST API
29. 29
Automated testing
â The Core API will have
unit and integration tests
with a minimal test
coverage of 70%.
â The RESTful API will have
automated functional
tests for all API calls.
34. 34
The Hyve team
â Project manager: Erik van Eeuwijk
â Business analyst: Ward Weistra (me)
â Technical lead: Gijs Kant
â Development team:
â Piotr Zakrzewski (present)
â Ruslan Forostianov (present)
â Jan Kanis
â Ewelina Grudzien
â Olaf Meuwese
â Barteld Klasens (automated testing)
35. 35
Timeline
â Module A and B: End of 2016
â Time series, samples, cross-study concepts
â Transcript-level RNA-Seq
â Module C and project release: End of Q1 2017
â Linking with Arvados
â TranSMART 17.1 version release: Q2 2017
â Integration with all community developments