SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Awakening Clinical Data: Semantics for
Scalable Medical Research Informatics
                    Satya S. Sahoo
                 Division Medical Informatics
  Electrical Engineering and Computer Science Department
           Case Western Reserve University
                   Cleveland, OH, USA
Big Picture of Data in Clinical Research
143, 961 Patients per year
(e.g. Emory)                                                                          MRI: 50-100MB
                                                                                      PET: 60-100MB




                                         National Sleep Research Resource: 500 TB                        MRI, PET scans
     Patient Reports
                                                                                                      source: PRISM project, BME dept CWRU
source: PRISM project CWRU
                                                     Case Western EMU: 250 TB
  Epilepsy Monitoring Unit (EMU) Data
                                    500-600MB per patient
                                    per stay in EMU



                                                                                                         Wireless Health Data
                                                                                                         source: CWRU School of Engineering

                                                                                                         ~5.6 billion wireless
                              1-20GB each                                                                connections and growing
       Polysomnograms
                                                    Pathology Reports, Tissue Bank
       source: Physio-MIMI, PRISM CWRU                    source: NLM and Wikipedia
Big Picture of Data in Clinical Research
143, 961 Patients per year
(e.g. Emory)                                          MRI: 50-100MB
                      •  Ultra large volume of data and growing rapidly
                                                      PET: 60-100MB
                      •  Data is Multi-modal, Heterogeneous
                      •  Heterogeneity: Syntactic, Structural, Semantic

                                         National Sleep Research Resource: 500 TB        MRI, PET scans
     Patient Reports
                                                                                      source: PRISM project, BME dept CWRU
source: PRISM project CWRU
                                                     Case Western EMU: 250 TB
  Epilepsy Monitoring Unit (EMU) Data
                                    500-600MB per patient
                                    per stay in EMU



                                                                                         Wireless Health Data
                                                                                         source: CWRU School of Engineering

                                                                                         ~5.6 billion wireless
                              1-20GB each                                                connections and growing
       Polysomnograms
                                                    Pathology Reports, Tissue Bank
       source: Physio-MIMI, PRISM CWRU                    source: NLM and Wikipedia
Scalability in Medical Informatics: Beyond Volume
                                         Exemplar: Sleep Medicine Research




                                                                                   MRI, PET scans
     Patient Reports
                                                                                source: PRISM project, BME dept CWRU
source: PRISM project CWRU


  Epilepsy Monitoring Unit (EMU) Data




                                                                                   Wireless Health Data
                                                                                   source: CWRU School of Engineering




       Polysomnograms
                                              Pathology Reports, Tissue Bank
       source: Physio-MIMI, PRISM CWRU              source: NLM and Wikipedia
Scalability in Medical Informatics: Beyond Volume
                                         Exemplar: Sleep Medicine Research




            •  Multi-Center Studies with differing
                administrative requirements – business logicscans
   Patient Reports
                                                      MRI, PET
                                                                                source: PRISM project, BME dept CWRU
source: PRISM project CWRU

            •  Dynamic data – grows over project duration
  Epilepsy Monitoring Unit (EMU) Data
            •  Data Semantics as foundation to support a
                wide spectrum of users – clinicians, nurse
                practitioners, research fellows
                                                                                   Wireless Health Data
                                                                                   source: CWRU School of Engineering




       Polysomnograms
                                              Pathology Reports, Tissue Bank
       source: Physio-MIMI, PRISM CWRU              source: NLM and Wikipedia
A Wish List for Scalable Clinical Data Management
•  Reconcile Data Heterogeneity – most critical to successful
   translational research
   o  Syntactic heterogeneity – less of a problem, data dictionaries
      help
   o  Structural heterogeneity – problematic, XML somewhat helpful
   o  Semantic heterogeneity – a huge problem, ontologies to the
      rescue?
•  Provenance – essential for data quality, compliance, insight
   o  Blood Oxygen Baseline: oxygen saturation during the first 15 or
      30 seconds of sleep
   o  Patient blood report last month cause of change in medication
      – Domain Provenance (not just tuple provenance)
•  Intuitive access to information – clinical trials eligibility,
   cohort identification
•  Scalable - Data sources, research partners added or removed
   dynamically
A “not to do” list for Clinical Data Management




                                         Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch




•  No Linked Open Patient Data – HIPAA, HITECH
   Act (US), Data Protection Act (UK)
  o  De-identified data – IRB approval
•  Ontology as global schema – but no RDF
  o  Vast majority as RDB
  o  Practical issues with RDF – cannot be institution-
     specific URI (privacy)
Physio-MIMI: Multi‐Modality, Multi‐Resource Environment for Physiological
                              and Clinical Research

              Clinical
             Researcher




            SNOMED-CT                                 FMA
                                Sleep Domain
                          …       Ontology            OGMS



                                                                     Any
                                                                   number of
                                                                     new
                                                                    centers
Physio-MIMI: Enabling Scalable Medical Research
•  NCRR‐funded, multi‐CTSA site project: Sleep medicine as
   exemplar
•  Federated data management – scalable, adapts to changing
   data access policies
•  Ontology-driven:
   o  Data mappings – Ontology class to data dictionary terms
      (manually curated)
   o  Drive query interface
   o  Manage provenance
•  Privacy aware, IRB-compliant
•  Collaboration among Case Western, U. of Michigan,
   Marshfield Clinic and U. of Wisconsin, Madison
   o  Now Harvard Medical School
Key Resource: Sleep Domain Ontology (SDO)
           https://mimi.case.edu/concepts
Data Mappings: SDO to Data Dictionary
                       Physio-Map Module
                       •  Visual interface
                       •  Stores mappings in XML –
                       moving towards rules
                       •  Dynamically executed in response
                       to user query




       User Voting
Provenance: Contextual Metadata for Clinical
                Research




             Slide courtesy: Remo Mueller
Provenance: To Trace Variations in Data and
                 Results




             Slide courtesy: Remo Mueller
Modified from slide courtesy: Remo
Mueller
Provenance: Source information for Patient Data




                                    Slide courtesy: Remo Mueller
Intuitive Query Interface: Ontology (SDO)-driven
       Visual Aggregator and Explorer (VisAgE)


 DataSets




Ontology Concept – Type of Query Widget
PhysioMIMI in National Sleep Research Resource
•  National Sleep Research Resource (NSSR) – scored and
   awaiting funding review
•  Collaboration between Harvard Medical School (domain
   experts) and Case Western (CS) with 15 projects
    o  50,000 sleep research studies – total size of 500TB
•  Semantic Data Integration – SDO and Sleep Provenance
   Ontology (extending W3C PROV Ontology PROV-O)
•  Signal processing tools – using a common format called
   European Data Format (EDF), XML-based
•  Domain analysis, cross-linking – secure Web access
Challenges: Semantics in Large Scale Clinical Data
•  Incentives for adopting RDF in clinical data management
   – what is already not possible in RDB?
•  OWL2, RDFS reasoning – Privacy aware reasoning,
   semantics-aware access control (Nguyen et al. 2012)
•  Missing Semantics?
    o  Variable, missing provenance in original study - re-
       create provenance with (limited) provenance?
    o  Fine-level granularity for semantic annotation of
       signal data – currently not scalable
•  A little semantics does not go too far in clinical data
    o  Need for greater involvement of Semantic Web
       community in development of EHR systems
Acknowledgements
•  Guo-Qiang Zhang, Remo Mueller, Samden Lhatoo, Susan Redline, Alireza Bozorgi
•  Division of Medical Informatics: Lingyun Luo, Joe Teagno, Meng Zhao, Jake Luo,
   Licong Cui, Chien-Hung Chen, Catherine Jayapandian
•  Physio-MIMI Team: http://physiomimi.case.edu/
•  Contact Information: satya.sahoo@case.edu,
   http://cci.case.edu/cci/index.php/Satya_Sahoo

Weitere ähnliche Inhalte

Ähnlich wie Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Fireside chat: Newton Howard, Director of the MIT Synthetic Intelligence Lab ...
Fireside chat: Newton Howard, Director of the MIT Synthetic Intelligence Lab ...Fireside chat: Newton Howard, Director of the MIT Synthetic Intelligence Lab ...
Fireside chat: Newton Howard, Director of the MIT Synthetic Intelligence Lab ...
Codiax
 
Cancer genome repository_berkeley
Cancer genome repository_berkeleyCancer genome repository_berkeley
Cancer genome repository_berkeley
Shyam Sarkar
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
Joel Saltz
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 

Ähnlich wie Awakening Clinical Data: Semantics for Scalable Medical Research Informatics (20)

Sequencing Genomics: The New Big Data Driver
Sequencing Genomics:The New Big Data DriverSequencing Genomics:The New Big Data Driver
Sequencing Genomics: The New Big Data Driver
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 
The Importance Of Data Mining By Musa Mohd. Nordin, Noor
The Importance Of Data Mining By Musa Mohd. Nordin, NoorThe Importance Of Data Mining By Musa Mohd. Nordin, Noor
The Importance Of Data Mining By Musa Mohd. Nordin, Noor
 
Fireside chat: Newton Howard, Director of the MIT Synthetic Intelligence Lab ...
Fireside chat: Newton Howard, Director of the MIT Synthetic Intelligence Lab ...Fireside chat: Newton Howard, Director of the MIT Synthetic Intelligence Lab ...
Fireside chat: Newton Howard, Director of the MIT Synthetic Intelligence Lab ...
 
Bioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisBioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesis
 
Health Sciences Driving UCSD Research Cyberinfrastructure
Health Sciences Driving UCSD Research CyberinfrastructureHealth Sciences Driving UCSD Research Cyberinfrastructure
Health Sciences Driving UCSD Research Cyberinfrastructure
 
Brief introduction to Bioinformatics
Brief introduction to BioinformaticsBrief introduction to Bioinformatics
Brief introduction to Bioinformatics
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
Cancer genome repository_berkeley
Cancer genome repository_berkeleyCancer genome repository_berkeley
Cancer genome repository_berkeley
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
 
UNMSymposium2014
UNMSymposium2014UNMSymposium2014
UNMSymposium2014
 
GFII 2014 Big Data
GFII 2014 Big DataGFII 2014 Big Data
GFII 2014 Big Data
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
ANN presentataion
ANN presentataionANN presentataion
ANN presentataion
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
 
Driving Applications on the UCSD Big Data Freeway System
Driving Applications on the UCSD Big Data Freeway SystemDriving Applications on the UCSD Big Data Freeway System
Driving Applications on the UCSD Big Data Freeway System
 
Hybrid imaging
Hybrid imagingHybrid imaging
Hybrid imaging
 
NCI HTAN, cancer trajectories, precision oncology
NCI HTAN, cancer trajectories, precision oncologyNCI HTAN, cancer trajectories, precision oncology
NCI HTAN, cancer trajectories, precision oncology
 
Dr. Leroy Hood Lecuture on P4 Medicine
Dr. Leroy Hood Lecuture on P4 MedicineDr. Leroy Hood Lecuture on P4 Medicine
Dr. Leroy Hood Lecuture on P4 Medicine
 

Kürzlich hochgeladen

Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Sheetaleventcompany
 
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
Sheetaleventcompany
 
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Sheetaleventcompany
 
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
amritaverma53
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan 087776558899
 

Kürzlich hochgeladen (20)

💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...
💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...
💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...
 
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
 
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book nowChennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
 
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
 
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
 
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
 
Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
 
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
 
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
 
tongue disease lecture Dr Assadawy legacy
tongue disease lecture Dr Assadawy legacytongue disease lecture Dr Assadawy legacy
tongue disease lecture Dr Assadawy legacy
 
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
 
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
 
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
 
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptxANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
 
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
 
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
 
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room DeliveryCall 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
 
Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...
Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...
Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...
 

Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

  • 1. Awakening Clinical Data: Semantics for Scalable Medical Research Informatics Satya S. Sahoo Division Medical Informatics Electrical Engineering and Computer Science Department Case Western Reserve University Cleveland, OH, USA
  • 2. Big Picture of Data in Clinical Research 143, 961 Patients per year (e.g. Emory) MRI: 50-100MB PET: 60-100MB National Sleep Research Resource: 500 TB MRI, PET scans Patient Reports source: PRISM project, BME dept CWRU source: PRISM project CWRU Case Western EMU: 250 TB Epilepsy Monitoring Unit (EMU) Data 500-600MB per patient per stay in EMU Wireless Health Data source: CWRU School of Engineering ~5.6 billion wireless 1-20GB each connections and growing Polysomnograms Pathology Reports, Tissue Bank source: Physio-MIMI, PRISM CWRU source: NLM and Wikipedia
  • 3. Big Picture of Data in Clinical Research 143, 961 Patients per year (e.g. Emory) MRI: 50-100MB •  Ultra large volume of data and growing rapidly PET: 60-100MB •  Data is Multi-modal, Heterogeneous •  Heterogeneity: Syntactic, Structural, Semantic National Sleep Research Resource: 500 TB MRI, PET scans Patient Reports source: PRISM project, BME dept CWRU source: PRISM project CWRU Case Western EMU: 250 TB Epilepsy Monitoring Unit (EMU) Data 500-600MB per patient per stay in EMU Wireless Health Data source: CWRU School of Engineering ~5.6 billion wireless 1-20GB each connections and growing Polysomnograms Pathology Reports, Tissue Bank source: Physio-MIMI, PRISM CWRU source: NLM and Wikipedia
  • 4. Scalability in Medical Informatics: Beyond Volume Exemplar: Sleep Medicine Research MRI, PET scans Patient Reports source: PRISM project, BME dept CWRU source: PRISM project CWRU Epilepsy Monitoring Unit (EMU) Data Wireless Health Data source: CWRU School of Engineering Polysomnograms Pathology Reports, Tissue Bank source: Physio-MIMI, PRISM CWRU source: NLM and Wikipedia
  • 5. Scalability in Medical Informatics: Beyond Volume Exemplar: Sleep Medicine Research •  Multi-Center Studies with differing administrative requirements – business logicscans Patient Reports MRI, PET source: PRISM project, BME dept CWRU source: PRISM project CWRU •  Dynamic data – grows over project duration Epilepsy Monitoring Unit (EMU) Data •  Data Semantics as foundation to support a wide spectrum of users – clinicians, nurse practitioners, research fellows Wireless Health Data source: CWRU School of Engineering Polysomnograms Pathology Reports, Tissue Bank source: Physio-MIMI, PRISM CWRU source: NLM and Wikipedia
  • 6. A Wish List for Scalable Clinical Data Management •  Reconcile Data Heterogeneity – most critical to successful translational research o  Syntactic heterogeneity – less of a problem, data dictionaries help o  Structural heterogeneity – problematic, XML somewhat helpful o  Semantic heterogeneity – a huge problem, ontologies to the rescue? •  Provenance – essential for data quality, compliance, insight o  Blood Oxygen Baseline: oxygen saturation during the first 15 or 30 seconds of sleep o  Patient blood report last month cause of change in medication – Domain Provenance (not just tuple provenance) •  Intuitive access to information – clinical trials eligibility, cohort identification •  Scalable - Data sources, research partners added or removed dynamically
  • 7. A “not to do” list for Clinical Data Management Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch •  No Linked Open Patient Data – HIPAA, HITECH Act (US), Data Protection Act (UK) o  De-identified data – IRB approval •  Ontology as global schema – but no RDF o  Vast majority as RDB o  Practical issues with RDF – cannot be institution- specific URI (privacy)
  • 8. Physio-MIMI: Multi‐Modality, Multi‐Resource Environment for Physiological and Clinical Research Clinical Researcher SNOMED-CT FMA Sleep Domain … Ontology OGMS Any number of new centers
  • 9. Physio-MIMI: Enabling Scalable Medical Research •  NCRR‐funded, multi‐CTSA site project: Sleep medicine as exemplar •  Federated data management – scalable, adapts to changing data access policies •  Ontology-driven: o  Data mappings – Ontology class to data dictionary terms (manually curated) o  Drive query interface o  Manage provenance •  Privacy aware, IRB-compliant •  Collaboration among Case Western, U. of Michigan, Marshfield Clinic and U. of Wisconsin, Madison o  Now Harvard Medical School
  • 10. Key Resource: Sleep Domain Ontology (SDO) https://mimi.case.edu/concepts
  • 11. Data Mappings: SDO to Data Dictionary Physio-Map Module •  Visual interface •  Stores mappings in XML – moving towards rules •  Dynamically executed in response to user query User Voting
  • 12. Provenance: Contextual Metadata for Clinical Research Slide courtesy: Remo Mueller
  • 13. Provenance: To Trace Variations in Data and Results Slide courtesy: Remo Mueller
  • 14. Modified from slide courtesy: Remo Mueller
  • 15. Provenance: Source information for Patient Data Slide courtesy: Remo Mueller
  • 16. Intuitive Query Interface: Ontology (SDO)-driven Visual Aggregator and Explorer (VisAgE) DataSets Ontology Concept – Type of Query Widget
  • 17. PhysioMIMI in National Sleep Research Resource •  National Sleep Research Resource (NSSR) – scored and awaiting funding review •  Collaboration between Harvard Medical School (domain experts) and Case Western (CS) with 15 projects o  50,000 sleep research studies – total size of 500TB •  Semantic Data Integration – SDO and Sleep Provenance Ontology (extending W3C PROV Ontology PROV-O) •  Signal processing tools – using a common format called European Data Format (EDF), XML-based •  Domain analysis, cross-linking – secure Web access
  • 18. Challenges: Semantics in Large Scale Clinical Data •  Incentives for adopting RDF in clinical data management – what is already not possible in RDB? •  OWL2, RDFS reasoning – Privacy aware reasoning, semantics-aware access control (Nguyen et al. 2012) •  Missing Semantics? o  Variable, missing provenance in original study - re- create provenance with (limited) provenance? o  Fine-level granularity for semantic annotation of signal data – currently not scalable •  A little semantics does not go too far in clinical data o  Need for greater involvement of Semantic Web community in development of EHR systems
  • 19. Acknowledgements •  Guo-Qiang Zhang, Remo Mueller, Samden Lhatoo, Susan Redline, Alireza Bozorgi •  Division of Medical Informatics: Lingyun Luo, Joe Teagno, Meng Zhao, Jake Luo, Licong Cui, Chien-Hung Chen, Catherine Jayapandian •  Physio-MIMI Team: http://physiomimi.case.edu/ •  Contact Information: satya.sahoo@case.edu, http://cci.case.edu/cci/index.php/Satya_Sahoo