SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Design and generation of Linked Clinical
Data Cube
Laurent Lefort and Hugo Leroux
1st Workshop on Semantic Statistics, 22 October 2013
CSIRO COMPUTATIONAL INFORMATICS
Australian Imaging Biomarkers and Lifestyle data
Problem – Solution – Remaining challenges

• Data collected for the AIBL study
• (aibl.csiro.au)
• For the early detection of Alzheimer’s Disease
• Protocol aligned with Alzheimer’s Disease Neuroimaging Initiative
(ADNI)
• Plus nutrition, lifestyle
• Microdata collected via electronic data capture tool (OpenClinica)
• To be consumed by researchers
• 1) discovery + data quality assessment (fix)
• 2) production of publishable results

• Original format (export format): CDISC ODM
• Clinical Data Interchange Standards Consortium
• Operational Data Model

2 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
AIBL Protocol
and e-data capture
Vital Signs
Blood
Neuropsychological t.
Demographics
Medications
PiB PET scan and MRI for ¼
Diagnostic Summary
Diet and lifestyle Q.
3 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Examples of forms
(OpenClinica)

4 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
CDISC ODM XML Schema

ODM

METADATA
*Study

* MetadataVersion

DATA
*Clinical Data
*Subject Data

* Study Event Def
*Form Def
* ItemGroup Def
* Item Def

* Study Event Data

Form
Data types
Same OID

*Form Data
* ItemGroup Data
* Item Data

Data values
5 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Current use of AIBL data
Problem – Solution – Remaining challenges

•
•
•
•

Conversion in tabular format (Excel or CSV)
Browser-based exploration tool
Additional processing via Excel or R or …
(different toolset for AIBL and ADNI)

6 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Primary motivation: add new dimensions
{ts} observation

Main

Time (scale = second)

{cs}

Subject

Date (scale = day)

{dataset}
before,after,at

Study Event
(scale = phase)

Month (scale = month)

Node
Product

AMT Trade
Product id

main

cm
Theme

{ds}

Sub-theme

vs

Variable
{v}

7 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort

Medication

AMT Med.
Product id
SNOMED
Substance id
WHO CC ATC DDD
{atc}
Is this a Semantic Stats problem?
Problem – Solution – Remaining challenges

• Yes!
• Transition from monolithic tree structure to multidimensional data cube  RDF Data Cube Vocabulary
(and associated SDMX best practices for slicing it)
• Complex study structure plus user-defined data types
 DDI-RDF Discovery (and associated DDI Best
practices)

8 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
RDF Data cube http://purl.org/linked-data/cube
• RDF Data Cube (qb): a method to organise linked data in slices
• A vocabulary published by the W3C Government Linked Data (GLD) Working
Group (Working Draft)
• Also the method used to publish statistics data and environmental data in
Europe e.g. for Bathing Water Quality in UK
http://www.epimorphics.com/web/projects/bathing-water-quality

• Advantages
• Allows multiple views on the same data (similar to OLAP)
• Generic approach which supports the links to domain-specific definitions

• Useable:
• In any browser via Linked Data API (HTML output)
• In JavaScript via Linked Data API (JSON output)
• In R via SPARQL

9 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
QB (Candidate Rec. version)

10 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
DDI-RDF Discovery
• DDI-RDF Discovery Vocabulary (disco): a metadata vocabulary for
documenting research and survey data
• Derived from the Data Documentation Initiative standards
– DDI and W3C people
– two Dagstuhl Seminars
• Designed as a complement to existing vocabularies developed by W3C
community (Dublin Core, SKOS, XKOS, DCAT (data catalogue), RDF Data Cube

• Work in progress
• At least half of it not discussed in this talk - Dataset statistics
• Useful if we want to attach statistical data to slices …

11 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Disco (study description)

12 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
QB, Disco and ODM
ODM

disco:LogicalDataset
Clinical
Data

qb:Dataset

Study

Subject
Data
qb:Slice

Metadata
Study
Event Def

Study
Event Data
qb:Slice

Form
Data

disco:Universe
ItemGroup
Data

qb:Observation

ItemGroup
Def

Item
Data
disco:Variable

13 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort

Form
Def

Item
Def
Disco:VariableDefinition
Is this a Semantic Stats problem?
Problem – Solution – Remaining challenges

• But …
• We have more than one data cube …
• 25 after elimination of privacy-sensitive data: patient
details, doctor details, …

14 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Clinical

Study
Screening
(Telephone)

Blood

Main cube

CSF
MCI Patient
Vital Signs
AD Patient
Medical History

Personal
Info
Assessment
Progress

Family History
Medication

Lifestyle

1655
variables

Demographics

Neuropsychiatric
Inventory
Examination

Actigraph
Physical Activity
Food frequencies

Cognitive

Imaging

Food intakes

PET
-PIB

Food nutrients

Neuropsychological
Battery
Memory
Complaint
Questionnaire

MRI
Alcoholic nutrients

Short IQ Code
Dexa

Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Is this a Semantic Stats problem?
Problem – Solution – Remaining challenges

• Solution: Nested Data Cubes
– Based on big table defining which variables go in
which cubes

16 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Product -> Study Phase ->
Universe ->
(Date) -> (Time)

Solution: Nested data cubes

3,4
Cross-section

Participant <- Node

1,2
Time Series

A

Id-> Phase 7a
-> (Date. Time)

8a

Specialised
Vital Signs
Cube

Core obs.(links
to special. obs)

17 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort

Specialised
Medication Cube
Id-> Phase -> Variable ->
(Date. Time)
7b
7b
Slice

B

8b

Core AIBL
Cube

Vital Sign obs.

C
Medication obs.
Is this a Semantic Stats problem?
Problem – Solution – Remaining challenges

• Solution: Nested Data Cubes based on RDF Data Cube
vocabulary
– Maximum compatibility required to be able to reuse
tooling developed according to W3C specification
– Plus URI scheme

18 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Nested Data Cubes with QB
qb:dataSet
Main

qb:slice SERIES
Phase

Dated
SECTIONS
Subject
SLICES
Sub-theme

qb:slice
Specia
lised

qb:observation
Observation
Observation
:mainObservation
qb:observationGroup

SERIES
Sub-theme
SECTIONS
Sub-Theme

19 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort

ObservationGroup
ObservationGroup
qb:dataSet

:specialisedObservation

qb:observation
Observation
Observation
Main cube
Product series (pr)
Phase series (ph)

URI scheme
ROOT/{dataset}/ts/pr/{pr}
ROOT/{dataset}/ts/pr/{pr}/ph/{ph}

Dated series (dt)
Product section (pr)
Node section (nd)
Gender section (gd)

ROOT/{dataset}/ts/pr/{pr}/ph/{ph}/dt/{dt}
ROOT/{dataset}/cs/pr/{pr}
ROOT/{dataset}/cs/pr/{pr}/nd/{nd}
ROOT/{dataset}/cs/pr/{pr}/gd/{gd}

Subject section (su)
Product slices (pr)
Theme slices (th)

ROOT/{dataset}/cs/pr/{pr}/nd/{nd}/su/{su}
ROOT/{dataset}/ds/pr/{pr}
ROOT/{dataset}/ds/pr/{pr}/th/{th}

Sub-theme slices (st)
Observation groups

ROOT/{dataset}/ds/pr/{pr}/th/{th}/st/{st}
ROOT/{dataset}/pr/{pr}/ph/{ph}/su/{su}

20 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Access via SPARQL (+ Visual Box)

21 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Holes
Problem – Solution – Remaining challenges

• Survey-originated data more likely to have missing data
• Rule-based forms: if X … then display box to enter the value of Y
• Cases where the patient is no longer available for the study (deceased
patient or “lost”patient)

• Question (specs writer): how can you handle datasets with
holes?
• QB example: issue with some IC constraints (see paper)

• Question (tools developers): can you handle datasets with holes
(and help the user to avoid them and understand them)?

22 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Sensitive data
Problem – Solution – Remaining challenges

• Need to answer privacy issue with data like AIBL
• Proposal: add different identifiers for each specialised data
cube with links at the slice level?
• To allow browsing /exploration of one data cube at a time with lighter
research approval regime (to give enough information for specialist
working on data quality issues and for researchers to decide if they
need to apply to be granted full access).
• Question: implementation /access control and performance
constraints for queries accessing multiple cubes

23 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Conclusions
Problem – Solution – Remaining challenges

• Benefits of semantic statistics vocabularies
• Adoption of SDMX best practices (SDMX guidelines for DSDs
Statistical Data and Metadata Exchange 2012)
• Modularity
• Approach reusable for other domains

• Ongoing work on vocabulary mappings (Medications)

• Adoption by Pharma community (FDA/PhUSE)?

24 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
Thank you
CSIRO COMPUTATIONAL INFORMATICS
Laurent Lefort
Ontologist
t +61 2 9123 4567
e laurent.lefort@csiro.au

CSIRO COMPUTATIONAL INFORMATICS

Weitere ähnliche Inhalte

Was ist angesagt?

The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals FederationManjulaPatel
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentManjulaPatel
 
Exploring Coverage and Distribution of Scholarly Identifiers on the Web
Exploring Coverage and Distribution of Scholarly Identifiers on the WebExploring Coverage and Distribution of Scholarly Identifiers on the Web
Exploring Coverage and Distribution of Scholarly Identifiers on the WebOpen Knowledge Maps
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesManjulaPatel
 
Summary of data citation synthesis activity & Review
Summary of data citation synthesis activity & ReviewSummary of data citation synthesis activity & Review
Summary of data citation synthesis activity & ReviewMicah Altman
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishingVarsha Khodiyar
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014iedadata
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemWolfgang Kuchinke
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 
Alive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyAlive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyCESSDA Training
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationJohn Doove
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
PT-CRIS: EUROCRIS: 2013: Eco-system of research information systems
PT-CRIS: EUROCRIS: 2013: Eco-system of research information systemsPT-CRIS: EUROCRIS: 2013: Eco-system of research information systems
PT-CRIS: EUROCRIS: 2013: Eco-system of research information systemsJoão Mendes Moreira
 
Pt cris-casrai-rome-16-may-2014-v1
Pt cris-casrai-rome-16-may-2014-v1Pt cris-casrai-rome-16-may-2014-v1
Pt cris-casrai-rome-16-may-2014-v1João Mendes Moreira
 
Using the Research Graph and Data Switchboard for cross-platform discovery
Using the Research Graph and Data Switchboard for cross-platform discoveryUsing the Research Graph and Data Switchboard for cross-platform discovery
Using the Research Graph and Data Switchboard for cross-platform discoveryamiraryani
 

Was ist angesagt? (20)

The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
Exploring Coverage and Distribution of Scholarly Identifiers on the Web
Exploring Coverage and Distribution of Scholarly Identifiers on the WebExploring Coverage and Distribution of Scholarly Identifiers on the Web
Exploring Coverage and Distribution of Scholarly Identifiers on the Web
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
Summary of data citation synthesis activity & Review
Summary of data citation synthesis activity & ReviewSummary of data citation synthesis activity & Review
Summary of data citation synthesis activity & Review
 
Working with Global Infrastructure at a National Level
Working with Global Infrastructure at a National LevelWorking with Global Infrastructure at a National Level
Working with Global Infrastructure at a National Level
 
Welcome to HDF Workshop V
Welcome to HDF Workshop VWelcome to HDF Workshop V
Welcome to HDF Workshop V
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data Ecosystem
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Alive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyAlive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values Study
 
Brislinger, Recker: Keeping data re-usable in the evs
Brislinger, Recker: Keeping data re-usable in the evsBrislinger, Recker: Keeping data re-usable in the evs
Brislinger, Recker: Keeping data re-usable in the evs
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
PT-CRIS: EUROCRIS: 2013: Eco-system of research information systems
PT-CRIS: EUROCRIS: 2013: Eco-system of research information systemsPT-CRIS: EUROCRIS: 2013: Eco-system of research information systems
PT-CRIS: EUROCRIS: 2013: Eco-system of research information systems
 
Pt cris-casrai-rome-16-may-2014-v1
Pt cris-casrai-rome-16-may-2014-v1Pt cris-casrai-rome-16-may-2014-v1
Pt cris-casrai-rome-16-may-2014-v1
 
Using the Research Graph and Data Switchboard for cross-platform discovery
Using the Research Graph and Data Switchboard for cross-platform discoveryUsing the Research Graph and Data Switchboard for cross-platform discovery
Using the Research Graph and Data Switchboard for cross-platform discovery
 

Ähnlich wie Design and generation of Linked Clinical Data Cube (Semantic Stats 2013)

IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
"Data in Context" IG sessions @ RDA 3rd Plenary
"Data in Context" IG sessions @  RDA 3rd Plenary"Data in Context" IG sessions @  RDA 3rd Plenary
"Data in Context" IG sessions @ RDA 3rd PlenaryBrigitte Jörg
 
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Brigitte Jörg
 
Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc RDM
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Jeroen Rombouts
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Philip Bourne
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanPhilippe Rocca-Serra
 
Chemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopChemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopMarcus Hanwell
 
Sla poster rdm_draft_20140529_v04
Sla poster rdm_draft_20140529_v04Sla poster rdm_draft_20140529_v04
Sla poster rdm_draft_20140529_v04ubcphysioblog
 
Data Management Lab: Session 2 slides
Data Management Lab: Session 2 slidesData Management Lab: Session 2 slides
Data Management Lab: Session 2 slidesIUPUI
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
RDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the DataRDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the DataRobin Rice
 

Ähnlich wie Design and generation of Linked Clinical Data Cube (Semantic Stats 2013) (20)

IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
HDF5 FastQuery
HDF5 FastQueryHDF5 FastQuery
HDF5 FastQuery
 
Baker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated AudiencesBaker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated Audiences
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
"Data in Context" IG sessions @ RDA 3rd Plenary
"Data in Context" IG sessions @  RDA 3rd Plenary"Data in Context" IG sessions @  RDA 3rd Plenary
"Data in Context" IG sessions @ RDA 3rd Plenary
 
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
 
Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
 
Chemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopChemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the Desktop
 
Sla poster rdm_draft_20140529_v04
Sla poster rdm_draft_20140529_v04Sla poster rdm_draft_20140529_v04
Sla poster rdm_draft_20140529_v04
 
Data Management Lab: Session 2 slides
Data Management Lab: Session 2 slidesData Management Lab: Session 2 slides
Data Management Lab: Session 2 slides
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
RDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the DataRDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the Data
 

Mehr von Laurent Lefort

Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...Laurent Lefort
 
Linked Sensor Data cube
Linked Sensor Data cubeLinked Sensor Data cube
Linked Sensor Data cubeLaurent Lefort
 
Future manufacturing informatics - typology of manufacturing data
Future manufacturing informatics - typology of manufacturing dataFuture manufacturing informatics - typology of manufacturing data
Future manufacturing informatics - typology of manufacturing dataLaurent Lefort
 
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Laurent Lefort
 
Semantically enabled standard development
Semantically enabled standard developmentSemantically enabled standard development
Semantically enabled standard developmentLaurent Lefort
 
Standards for Semantic Mashups
Standards for Semantic MashupsStandards for Semantic Mashups
Standards for Semantic MashupsLaurent Lefort
 
Semantic Web For Hack Days
Semantic Web For Hack DaysSemantic Web For Hack Days
Semantic Web For Hack DaysLaurent Lefort
 

Mehr von Laurent Lefort (8)

Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
 
Linked Sensor Data cube
Linked Sensor Data cubeLinked Sensor Data cube
Linked Sensor Data cube
 
Future manufacturing informatics - typology of manufacturing data
Future manufacturing informatics - typology of manufacturing dataFuture manufacturing informatics - typology of manufacturing data
Future manufacturing informatics - typology of manufacturing data
 
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
 
Govhack cached
Govhack cachedGovhack cached
Govhack cached
 
Semantically enabled standard development
Semantically enabled standard developmentSemantically enabled standard development
Semantically enabled standard development
 
Standards for Semantic Mashups
Standards for Semantic MashupsStandards for Semantic Mashups
Standards for Semantic Mashups
 
Semantic Web For Hack Days
Semantic Web For Hack DaysSemantic Web For Hack Days
Semantic Web For Hack Days
 

Kürzlich hochgeladen

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Design and generation of Linked Clinical Data Cube (Semantic Stats 2013)

  • 1. Design and generation of Linked Clinical Data Cube Laurent Lefort and Hugo Leroux 1st Workshop on Semantic Statistics, 22 October 2013 CSIRO COMPUTATIONAL INFORMATICS
  • 2. Australian Imaging Biomarkers and Lifestyle data Problem – Solution – Remaining challenges • Data collected for the AIBL study • (aibl.csiro.au) • For the early detection of Alzheimer’s Disease • Protocol aligned with Alzheimer’s Disease Neuroimaging Initiative (ADNI) • Plus nutrition, lifestyle • Microdata collected via electronic data capture tool (OpenClinica) • To be consumed by researchers • 1) discovery + data quality assessment (fix) • 2) production of publishable results • Original format (export format): CDISC ODM • Clinical Data Interchange Standards Consortium • Operational Data Model 2 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 3. AIBL Protocol and e-data capture Vital Signs Blood Neuropsychological t. Demographics Medications PiB PET scan and MRI for ¼ Diagnostic Summary Diet and lifestyle Q. 3 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 4. Examples of forms (OpenClinica) 4 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 5. CDISC ODM XML Schema ODM METADATA *Study * MetadataVersion DATA *Clinical Data *Subject Data * Study Event Def *Form Def * ItemGroup Def * Item Def * Study Event Data Form Data types Same OID *Form Data * ItemGroup Data * Item Data Data values 5 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 6. Current use of AIBL data Problem – Solution – Remaining challenges • • • • Conversion in tabular format (Excel or CSV) Browser-based exploration tool Additional processing via Excel or R or … (different toolset for AIBL and ADNI) 6 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 7. Primary motivation: add new dimensions {ts} observation Main Time (scale = second) {cs} Subject Date (scale = day) {dataset} before,after,at Study Event (scale = phase) Month (scale = month) Node Product AMT Trade Product id main cm Theme {ds} Sub-theme vs Variable {v} 7 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort Medication AMT Med. Product id SNOMED Substance id WHO CC ATC DDD {atc}
  • 8. Is this a Semantic Stats problem? Problem – Solution – Remaining challenges • Yes! • Transition from monolithic tree structure to multidimensional data cube  RDF Data Cube Vocabulary (and associated SDMX best practices for slicing it) • Complex study structure plus user-defined data types  DDI-RDF Discovery (and associated DDI Best practices) 8 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 9. RDF Data cube http://purl.org/linked-data/cube • RDF Data Cube (qb): a method to organise linked data in slices • A vocabulary published by the W3C Government Linked Data (GLD) Working Group (Working Draft) • Also the method used to publish statistics data and environmental data in Europe e.g. for Bathing Water Quality in UK http://www.epimorphics.com/web/projects/bathing-water-quality • Advantages • Allows multiple views on the same data (similar to OLAP) • Generic approach which supports the links to domain-specific definitions • Useable: • In any browser via Linked Data API (HTML output) • In JavaScript via Linked Data API (JSON output) • In R via SPARQL 9 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 10. QB (Candidate Rec. version) 10 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 11. DDI-RDF Discovery • DDI-RDF Discovery Vocabulary (disco): a metadata vocabulary for documenting research and survey data • Derived from the Data Documentation Initiative standards – DDI and W3C people – two Dagstuhl Seminars • Designed as a complement to existing vocabularies developed by W3C community (Dublin Core, SKOS, XKOS, DCAT (data catalogue), RDF Data Cube • Work in progress • At least half of it not discussed in this talk - Dataset statistics • Useful if we want to attach statistical data to slices … 11 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 12. Disco (study description) 12 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 13. QB, Disco and ODM ODM disco:LogicalDataset Clinical Data qb:Dataset Study Subject Data qb:Slice Metadata Study Event Def Study Event Data qb:Slice Form Data disco:Universe ItemGroup Data qb:Observation ItemGroup Def Item Data disco:Variable 13 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort Form Def Item Def Disco:VariableDefinition
  • 14. Is this a Semantic Stats problem? Problem – Solution – Remaining challenges • But … • We have more than one data cube … • 25 after elimination of privacy-sensitive data: patient details, doctor details, … 14 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 15. Clinical Study Screening (Telephone) Blood Main cube CSF MCI Patient Vital Signs AD Patient Medical History Personal Info Assessment Progress Family History Medication Lifestyle 1655 variables Demographics Neuropsychiatric Inventory Examination Actigraph Physical Activity Food frequencies Cognitive Imaging Food intakes PET -PIB Food nutrients Neuropsychological Battery Memory Complaint Questionnaire MRI Alcoholic nutrients Short IQ Code Dexa Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 16. Is this a Semantic Stats problem? Problem – Solution – Remaining challenges • Solution: Nested Data Cubes – Based on big table defining which variables go in which cubes 16 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 17. Product -> Study Phase -> Universe -> (Date) -> (Time) Solution: Nested data cubes 3,4 Cross-section Participant <- Node 1,2 Time Series A Id-> Phase 7a -> (Date. Time) 8a Specialised Vital Signs Cube Core obs.(links to special. obs) 17 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort Specialised Medication Cube Id-> Phase -> Variable -> (Date. Time) 7b 7b Slice B 8b Core AIBL Cube Vital Sign obs. C Medication obs.
  • 18. Is this a Semantic Stats problem? Problem – Solution – Remaining challenges • Solution: Nested Data Cubes based on RDF Data Cube vocabulary – Maximum compatibility required to be able to reuse tooling developed according to W3C specification – Plus URI scheme 18 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 19. Nested Data Cubes with QB qb:dataSet Main qb:slice SERIES Phase Dated SECTIONS Subject SLICES Sub-theme qb:slice Specia lised qb:observation Observation Observation :mainObservation qb:observationGroup SERIES Sub-theme SECTIONS Sub-Theme 19 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort ObservationGroup ObservationGroup qb:dataSet :specialisedObservation qb:observation Observation Observation
  • 20. Main cube Product series (pr) Phase series (ph) URI scheme ROOT/{dataset}/ts/pr/{pr} ROOT/{dataset}/ts/pr/{pr}/ph/{ph} Dated series (dt) Product section (pr) Node section (nd) Gender section (gd) ROOT/{dataset}/ts/pr/{pr}/ph/{ph}/dt/{dt} ROOT/{dataset}/cs/pr/{pr} ROOT/{dataset}/cs/pr/{pr}/nd/{nd} ROOT/{dataset}/cs/pr/{pr}/gd/{gd} Subject section (su) Product slices (pr) Theme slices (th) ROOT/{dataset}/cs/pr/{pr}/nd/{nd}/su/{su} ROOT/{dataset}/ds/pr/{pr} ROOT/{dataset}/ds/pr/{pr}/th/{th} Sub-theme slices (st) Observation groups ROOT/{dataset}/ds/pr/{pr}/th/{th}/st/{st} ROOT/{dataset}/pr/{pr}/ph/{ph}/su/{su} 20 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 21. Access via SPARQL (+ Visual Box) 21 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 22. Holes Problem – Solution – Remaining challenges • Survey-originated data more likely to have missing data • Rule-based forms: if X … then display box to enter the value of Y • Cases where the patient is no longer available for the study (deceased patient or “lost”patient) • Question (specs writer): how can you handle datasets with holes? • QB example: issue with some IC constraints (see paper) • Question (tools developers): can you handle datasets with holes (and help the user to avoid them and understand them)? 22 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 23. Sensitive data Problem – Solution – Remaining challenges • Need to answer privacy issue with data like AIBL • Proposal: add different identifiers for each specialised data cube with links at the slice level? • To allow browsing /exploration of one data cube at a time with lighter research approval regime (to give enough information for specialist working on data quality issues and for researchers to decide if they need to apply to be granted full access). • Question: implementation /access control and performance constraints for queries accessing multiple cubes 23 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 24. Conclusions Problem – Solution – Remaining challenges • Benefits of semantic statistics vocabularies • Adoption of SDMX best practices (SDMX guidelines for DSDs Statistical Data and Metadata Exchange 2012) • Modularity • Approach reusable for other domains • Ongoing work on vocabulary mappings (Medications) • Adoption by Pharma community (FDA/PhUSE)? 24 | Design and Generation of Linked Clinical Data Cubes | Laurent Lefort
  • 25. Thank you CSIRO COMPUTATIONAL INFORMATICS Laurent Lefort Ontologist t +61 2 9123 4567 e laurent.lefort@csiro.au CSIRO COMPUTATIONAL INFORMATICS

Hinweis der Redaktion

  1. This talk has three parts:- The problem we are trying to solve and why Sem Stats vocabularies are relevant- What we have done and learned so farWhat’s nextaibl.csiro.au-
  2. AIBL BP, HR, weight, height, abdominal girth80 ml blood2 hours neuropsychological testingHADS and GDSMedication listDiet and lifestyle questionnairesPiB PET scan and MRI for ¼Diagnostic panel evaluationDA file reviewRepeat every 18 months
  3. Working Draft http://www.w3.org/TR/vocab-data-cube/