SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Imaging Data Commons (IDC):
Introduction and initial
approach
Andrey Fedorov, PhD, on behalf of the IDC consortium
Brigham and Women’s Hospital / Harvard Medical School
andrey.fedorov@gmail.com
Oct 7, 2019 - NCI Imaging Community call
Slides location: http://bit.ly/2019-idc-imgcommcall
2
“The NCI Imaging Data Commons will be a
cloud-based resource that connects
researchers with
1. cancer image collections
2. a robust infrastructure that contains imaging data,
subject and sample metadata and experimental
metadata from disparate sources
3. resources for searching, identifying and viewing
images, and
4. additional data types contained in other Cancer
Research Data Commons nodes.”
Cancer Research Data Commons (CRDC)
Imaging Data Commons (IDC)
IDC timeline to the award
● April 2018: request for information on
IDC development
● December 11, 2018: IDC solicitation for
competitive proposals released
● January 31, 2019: IDC proposals due
date
● July 24, 2019: IDC contract mutually
signed between Leidos Biomed and
BWH
3
IDC: The team
● Leadership:
○ Ron Kikinis*, Principal Investigator
○ Andrey Fedorov, Technical lead and project
manager
● Imaging R&D:
○ BWH: Ron Kikinis, Andrey Fedorov, Hugo Aerts
○ Isomics Inc: Steve Pieper
○ Radical Imaging: Rob Lewis, Erik Ziegler
○ Pixelmed: David Clunie (aka “Mr. DICOM”)
○ Fraunhofer MEVIS: André Homeyer
● Cloud:
○ Institute for Systems Biology: Bill Longabaugh
● Security:
○ General Dynamics IT: David Pot
* Ron Kikinis is 50/50 between BWH and MEVIS at the moment, 100%
at BHW effective March 2020 4
Bill LongabaughRon Kikinis Andrey Fedorov
Steve Pieper David Clunie
Hugo Aerts
David Pot
André HomeyerRob Lewis
Team background
● Open source image computing technology
○ 3D Slicer, OHIF, pyradiomics, dcmqi
● Network projects and collaborations
○ Quantitative Imaging Network (QIN)
○ Informatics Technology for Cancer Research (ITCR)
● Collaborations with TCIA
○ Data harmonization efforts (segmentations, LIDC)
● DICOM development
○ Refinement of the standard to address quantitative
imaging use case
○ Tools, outreach, industry collaborations
● Cloud infrastructure development
○ ISB-CGC one of the 3 Cancer Genomics Cloud pilots
5
Understanding the problem
6
CrowdFlower 2016 Data Science Report
● Imaging data = images + annotations + clinical
data ( + analysis results)
○ Current community focus is on images
○ Data preparation is very time consuming
○ Limited effort to support “organic growth” of the datasets
● Need multi-site, multi-reader, multi-tool,
representative cohorts
○ Cannot be done without conventions for data
representation
○ Requires tools to support harmonization efforts
○ Hard if not impossible to do retrospectively
● Need harmonization for
○ Semantics, data representation, communication interfaces
● CRDC takes those problems at new levels
● First priority: resource useful for imaging researchers
● Opportunity to address limitations and develop missing components
○ Visualization, search, data harmonization (where “data” is not limited to “images”!)
● Empower imaging research with metadata
○ Harmonize imaging, image derived and image related data
○ Provenance
○ Search
○ FAIR
● Initial goals: radiology and pathology
● Simplify accessibility of the already popular tools
● Simplify analysis workflows
● Lead by example: development of use cases is part of the project plan
● Longer term: cross-node integration
IDC vision
7
IDC and The Cancer Imaging Archive (TCIA)
● “TCIA is a service which de-identifies and hosts a large archive of medical
images of cancer accessible for public download.”
● Mostly clinical imaging data (radiology and digital pathology)
● IDC will:
○ Be part of the CRDC ecosystem, with the objective to support cross-domain data integration
○ Utilize public collections of TCIA to populate first wave of content, and will host public
collections of the TCIA going forward
○ Rely on TCIA for image de-identification
○ Collaborate with TCIA on the topics of data harmonization and development of tools of shared
interest
○ Encourage users to compute on the cloud by providing various incentives (e.g., compute
credits)
○ Discourage downloads of the data by not offsetting the data egress charges
8
IDC implementation - guiding principles
● Agile development approach
○ Broad initial direction, adaptive
○ Customer involvement
○ Short development cycles - sprints
● Phased implementation plan
○ Phase I: Pre Minimal Viable Product
○ Phase II: Minimal Viable Product
○ Phase III: Production / Further development
○ Phase IV: Further development / Maintenance
● IDC will host only public datasets, will NOT de-identify your data!
● Non-restrictive open source license for everything produced by IDC
● “Bag of tools” instead of monolithic development from scratch
● Standards-based
9
Phase I: Pre Minimal Viable Product
RFP-defined broad tasks:
● Definition of imaging data model, data dictionaries and ontologies
● Initial dataset and use-case definition with cross-collection data access
● Image data, metadata and file standards
● Evaluation of existing software and tools for reuse
● IDC advisory committee selection
10
Target completion:
October 2019
IDC data backbone: DICOM
● Digital Imaging and Communication in Medicine (DICOM) is the standard for
communication of medical imaging information and related data
○ emphasis on metadata standardization
○ compatibility with acquisition and archival tools
○ images (CT, MR, whole slide pathology) and analysis results (annotations, segmentation,
registration, quantification)
○ interoperability
○ coordinated with other standards (HL7/FHIR, JSON, XML, REST, WADO, BRIDG, SNOMED)
● History of development and adoption since 1983
● Adopted by virtually all manufacturers of medical imaging equipment
● Open international community of stakeholders
● DICOM is a live standard!
● IDC opportunities: raise awareness, create incentives, help transitioning 11
DICOM for data modeling
● DICOM files: combine attributes
from several real-world entities
○ Patient
○ Equipment
○ Modality-specific attributes
● Tables of attributes based on
modules
○ Support incremental growth of content
● Unique identifiers
● Specific tasks:
○ Machine-readable representation
○ Data model search interface
○ Performance evaluation
○ Tools
http://dicom.nema.org/medical/dicom/current/output/chtml/part03/chapter_A.html
DICOM Composite Instance IOD Information Model
12
● Radiology images
○ Bonus: multi-frame representation
● Pathology images
○ Converters
○ Capabilities
● Image-derived data
○ Annotations, parameter maps, qualitative evaluations
○ Radiology, pathology
● Other image types: open question
● Opportunities:
○ The knowns: converters, ontologies, capabilities, learning resources, missing data types
○ The unknowns: we have to be ready and nimble to leverage those as they come up
DICOM gaps assessment
13
Clinical data
● Basic clinical data: DICOM composite
context
● Treatment, diagnosis, ...
○ Excel spreadsheets?
○ One of a kind example: DICOM SR for
QIN-HEADNECK
● What can be reconciled and harmonized -
open question
● Approach:
○ Coordination with CRDC-wide resource: Cancer
for Cancer Data Harmonization (CCDH)
○ CDISC BRIDG: unifying model for clinical and
research domains (harmonized with DICOM)
14
https://www.cdisc.org/standards/domain-information-module/bridg
Key collaborators: Erik Ziegler, Trinity Urban, Gordon
Harris (OHIF), Markus Herrmann (BWH/MGH CCDS)
Image viewer: OHIF
● Browser-based (zero install!)
○ Open source, modern Javascript
○ DICOM standard images,
segmentations, annotations
○ Professional design
● DICOMweb supported by
Google, Siemens, and open
source servers
● VTK.js WebGL visualization
● Pathology plugin development
○ DICOM Whole Slide Imaging
○ Efficient DICOMweb pyramid
access
15
Annotation example:
Crowds Cure Cancer
● Expert annotation of cancer images
from TCIA
● Booth at RSNA 2017, 2018 (and
2019)
● Built on OHIF, react, dcm4chee,
AWS
● Desktop and Mobile
● > 5,000 measurements collected
● Help out at crowds-cure.org!
16
Key collaborators: Erik Ziegler, Trinity Urban, Dan Rukas, Gustavo Lelis, Jayashree Kalpathy-Cramer,
Gordon Harris, Fred Prior, Justin Kirby, and more...
Institute for Systems Biology - Cancer Genomic Cloud
(ISB-CGC)
● One of three Cancer Genomics Cloud Pilots, starting in September 2014
● Since October 2017, ISB-CGC is an NCI Cloud Resource (CR) component of the
NCI Cancer Research Data Commons (CRDC)
● As a Cloud Pilot, ISB-CGC built a platform that hosted and managed
controlled-access data stored in Google Cloud Storage buckets
○ This role is now being performed by the Genomic Data Commons (GDC) and the Data
Commons Framework (DCF)
○ ISB-CGC now uses Fence for handling A&A, linking Google IDs to eRA Commons IDs to provide
ISB-CGC users with access to controlled data
17https://isb-cgc.org/
Cloud platform
● Our existing ISB-CGC Web Application and API production code base can be
extensively reused and leveraged, and provides a low-risk path to stand up
the IDC minimum viable product (MVP) quickly
● Our knowledge of the existing CRDC ecosystem and roadmap will guide
architectural decisions
● Google is already providing imaging datasets
18
19
https://isb-cgc.appspot.com/cohorts/new_cohort/
Pilot support of image viewing in ISB-CGC
20
ISB-CGC Web Application prototyped integration a pathology viewer (using caMicroscope -> transitioning
to OHIF) and a radiology viewer (using OHIF Viewer) for TCGA data:
Google Healthcare
● Google Cloud is the platform used
for ISB-CGC
● Google initiated work with OHIF and
PixelMed
○ Google engineers have contributed
Google Cloud support to OHIF
○ DICOMweb protocols
● Google hosts TCIA images
● BigQuery tools for extracting and
interrogating DICOM metadata
● Authentication, data security,
compute, GPU, notebooks …
21
Datasets
● De-identification and curation: TCIA
● TCGA
○ Radiology
■ 1731 cases, 3022 DICOM studies, 20317 DICOM series
○ Pathology
■ 11007 cases, 11963 diagnostic images, 18304 frozen tissue images
○ Available in ISB-CGC
● Most public TCIA datasets are already replicated on Google Healthcare
○ Digital pathology excluded
22
Other sources of data
23
● IDC is not intended to be limited to
radiology and pathology!
○ 3D atlases of the cellular, morphological,
molecular features of human cancers over time
● Human Tumor Atlas Network (HTAN)
○ Close coordination with David Gutman
○ IDC’s Bill Longabaugh is a member of HTAN
● CPTAC Imaging in TCIA: potential
proteomics use case
● Clinical trial groups (e.g., ECOG-ACRIN)
● Pharma datasets slated for public release
through research projects https://www.cancer.gov/research/key-initiatives/moonsh
ot-cancer-initiative/implementation/human-tumor-atlas
Approach: Analytics / applications
● Goals:
○ empower researchers to do better science (integrative, larger, faster, rigorous, traceable, enable
comparative studies)
○ metadata in - metadata out
● Computational workflows applied to large datasets
○ cover radiomics, pathomics, and genomics
○ integration with containerized computational tool
● Demonstrate capabilities by implementing representative use cases
○ in coordination with domain experts
○ batch processing tools, user-guided when needed
○ deep learning and engineered technologies
● Initial focus: reproduce previously published studies
● Later stages: investigate novel aspects of the data
24
Approach: Applications - radiomics
● Build on numerous studies based on
TCIA datasets
○ Including those integrating imaging and
genomics data
● Engineered and deep learning
● Considered use cases
○ Correlative analyses
○ Prognostication
○ Imaging-genomic studies
25
Approach: Applications - pathomics
● Academic studies + industry grade
pathomics tools
● Opportunity for open source tool
development
● Use cases considered
○ correlating texture or shape features
derived from pathology images with
malignancy or survival
○ correlating texture and shape features of
cellular structures with different end
points (histological grade, clinical stage,
metastasis, lymph node spread, survival)
26
From: Yu et al. 2016. Predicting non-small cell lung cancer prognosis
by fully automated microscopic pathology image features. Nat
Commun.
Tile-based steatosis quantification. Homeyer et al. Focused scores
enable reliable discrimination of small differences in steatosis. Diagn.
Pathol. 13, 76 (2018).
Approach: Applications
- radiomics + pathomics
Evaluate potential links between
● radiomics quantifying radiographic
information, including macroscopic
heterogeneity
● pathomics signatures characterizing
the immune responses
● genetic markers
● clinical information and outcomes
27
Grossmann et al. 2017.
Defining the biological basis
of radiomic phenotypes in
lung cancer. Elife
Saltz et al. 2018. Spatial
Organization and Molecular
Correlation of
Tumor-Infiltrating
Lymphocytes Using Deep
Learning on Pathology
Images. Cell Rep.
Governance
● IDC as an NCI contract to the Frederick National Labs for Cancer Research (or
Leidos Biomed.)
● Todd Pihl, PhD as FNLCR Program Manager
● Keyvan Farahani, PhD as NCI Program Director
● Monthly reporting to Leidos Biomedical
● Weekly stakeholder meetings
● IDC Advisory Committee (TBD) to provide “guidance on IDC scope, direction,
and other governance issues including what datasets the IDC should
incorporate. This group will be composed of extramural experts in cancer
imaging, related technologies and NCI driving projects"
28
Phase II: Minimal Viable Product
RFP-defined broad tasks:
● Implementation of Gen3 integration (Fence, IndexD)
● Demonstration of IDC portal
● Cloud installation of and access to TCGA and one other collection
● Demonstration of viewer implementation for radiological images
● Demonstration of artificial cohort generation and identification
● Cross-cloud provider interoperability and standards
● User testing
● Outreach to and input from imaging and other cancer research communities
● [Support of continuous] Availability
29
Target completion:
late Summer 2020
Reuse of ISB-CGC Codebase
● ISB-CGC (in the initial Cloud Pilot phase) was originally developed to handle
storing the data, finding the data, and computing on it
● The pieces of ISB-CGC that were built for the first two roles in the original pilot
phase are ideal for reuse to set up the IDC
● In the current CRDC ecosystem, the roles and functionality of the Cloud
Resources (e.g. the current version of ISB-CGC), the Cancer Data Aggregator
(CDA), and data nodes in such as the IDC are distributed:
○ Compute is done in Cloud Resources using data and e.g. Dockerized tools make available by
the IDC
○ Cohort creation involving multiple data types (i.e. Pan-*DC search) is implemented in higher
layers (e.g. the resources, using the CDA to search across nodes
30
Outreach strategy
● Web presence (website, GitHub, mail list, Slack?)
● Interactive demonstration and learning resources: Jupyter Notebook / Google
Collab, workspaces, integration with viewers
● Publications accompanied by datasets and computational workspaces, data
descriptor publications
● Crowd-sourced annotation / analysis
● Connectathons?
● Outreach and coordination with vendors
● Tutorials at the major conferences: RSNA, MICCAI, SPIE (resources allowing!)
31
https://github.com/ImagingDataCommons @CancerIDC
Prior examples of outreach activities at BWH
32
https://projectweek.na-mic.org/
https://dicom4qi.readthedocs.io/
http://qiicr.org/dicom4miccai/
https://discourse.slicer.org/
Phase III: Production / Further development
RFP-defined broad tasks:
● ATO, FISMA compliance
● User engagement / help desk
● Work with CRDC for cross-node searching
● Demonstration of digital pathology viewer tool and other visualization
● Incorporation of additional image collections
● Support of derived datasets
● Interoperability with workspaces and cloud resources
33
Target completion:
late May 2021
Security
● Initially, all imaging data is planned to be de-identified, therefore open access
● Federal Information Security MAnagement Act (FISMA) Low security to get
Authority to Operate (ATO)
● Since design of system based on ISB-CGC (FISMA Moderate), much re-use of
security approach (and documentation) planned
● TCIA for data de-identification - no PHI data on IDC!
34
Digital pathology
● OHIF Viewer for visualization of
images and annotations
● DICOM supports digital pathology
○ Including extensive specimen metadata
● DICOM pathology annotation
capabilities will need development
● Converters
● Markus Herrmann, BWH/MGH
Center for Clinical Data Science
(CCDS) - IDC key collaborator for
the digital pathology use case
35
Approach for image-derived data
● Use standard DICOM objects
○ Segmentations, measurements, annotations,
parametric maps, ...
○ Numerous examples for radiology use cases
○ Pathology will require development
○ Other image types will need to be prioritized
● Improve/develop conversion tools
● Documentation, use cases to encourage
and support adoption
● Derived data submission procedures
● Search interface features and data
modeling considerations
36
Phase IV: Further development / Maintenance
RFP-defined broad tasks:
● Interaction with tool repositories
● Continued access and coordination of collections
● Help desk continuation
● Community engagement
37
Target completion:
July 2023
FAQ (based on emails/questions so far)
● IDC vs TCIA - hopefully we covered this earlier
● What are your plans for image viewing and annotation?
○ OHIF; Image viewing and annotation visualization in MVP
● What image data will IDC be hosting besides TCIA?
○ Image data from high research value biomedical imaging projects generating public datasets
○ To be determined in coordination with the IDC Advisory Committee
● Will you be providing an API to IDC for accessing the images and annotations
and will you be supporting DICOMweb?
○ Yes
● Going forward when people contribute biomedical datasets, where do they go
— to TCIA or IDC? Or will data contributed to TCIA automatically go to IDC?
○ New imaging data should be submitted TCIA, and will be pulled into IDC post-curation
● Once we do analyses, we will generate image metadata, those could be
contributed to the community via IDC, will IDC will accept those?
○ Yes, our desire is to make the process of contributing those back as seamless as possible 38
Significance beyond Data Commons
● Scientific reproducibility and cloud/standards/containerized analysis as a
components of the solution
● Routine generation of standardized data
● Raise awareness of the value of metadata, introduce tools to enable its
collection and use
● Opportunities to engage and integrate various groups of stakeholders
(industry, clinical trial groups, pharma, researchers, clinicians)
● We believe tools developed can be applicable for establishing private “mini
commons”
39
Dedication
40
Ed Helton
1945 - 2019
Associate Director of Clinical Trials
Programs and Products, NCI
Lawrence (Larry) Clarke
1944 - 2016
Chief of the Image Technology
Development Branch, NCI
We are hiring!
Apply here: http://bit.ly/2019-IDC-BWH-job
41

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Discover edina programmefinalmeeting-28-sep-2012
Discover edina programmefinalmeeting-28-sep-2012Discover edina programmefinalmeeting-28-sep-2012
Discover edina programmefinalmeeting-28-sep-2012
 
Research engagement in EUDAT| www.eudat.eu |
Research engagement in EUDAT| www.eudat.eu | Research engagement in EUDAT| www.eudat.eu |
Research engagement in EUDAT| www.eudat.eu |
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
6th COBWEB Consortium Meeting
6th COBWEB Consortium Meeting6th COBWEB Consortium Meeting
6th COBWEB Consortium Meeting
 
Some Academic Sector/NMCA outcomes from the OGC Web Service Shibboleth Intero...
Some Academic Sector/NMCA outcomes from the OGC Web Service Shibboleth Intero...Some Academic Sector/NMCA outcomes from the OGC Web Service Shibboleth Intero...
Some Academic Sector/NMCA outcomes from the OGC Web Service Shibboleth Intero...
 
EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)
EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)
EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)
 
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
 
ShareGeo: Discovering and Sharing Geospatial Data - 12 months on and going open!
ShareGeo: Discovering and Sharing Geospatial Data - 12 months on and going open!ShareGeo: Discovering and Sharing Geospatial Data - 12 months on and going open!
ShareGeo: Discovering and Sharing Geospatial Data - 12 months on and going open!
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
 
Tiziana ferrari icri 2018 v3
Tiziana ferrari icri 2018 v3Tiziana ferrari icri 2018 v3
Tiziana ferrari icri 2018 v3
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for BenchmarkingVirtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
 
Overview of HNSciCloud - Bob Jones (CERN)
Overview of HNSciCloud - Bob Jones (CERN)Overview of HNSciCloud - Bob Jones (CERN)
Overview of HNSciCloud - Bob Jones (CERN)
 
Report from RDAPlenary 3 to DataCitation Community in Australia
Report from RDAPlenary 3 to DataCitation Community in AustraliaReport from RDAPlenary 3 to DataCitation Community in Australia
Report from RDAPlenary 3 to DataCitation Community in Australia
 
Authentication Methods: Shibboleth
Authentication Methods: ShibbolethAuthentication Methods: Shibboleth
Authentication Methods: Shibboleth
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Certifying CISER! A Data Seal of Approval Case Study
Certifying CISER! A Data Seal of Approval Case StudyCertifying CISER! A Data Seal of Approval Case Study
Certifying CISER! A Data Seal of Approval Case Study
 
IEEE 2014 DOTNET DATA MINING PROJECTS Data mining with big data
IEEE 2014 DOTNET DATA MINING PROJECTS Data mining with big dataIEEE 2014 DOTNET DATA MINING PROJECTS Data mining with big data
IEEE 2014 DOTNET DATA MINING PROJECTS Data mining with big data
 
Research Data Management (RDM) Initiatives at the University of Edinburgh
Research Data Management (RDM) Initiatives at the University of EdinburghResearch Data Management (RDM) Initiatives at the University of Edinburgh
Research Data Management (RDM) Initiatives at the University of Edinburgh
 

Ähnlich wie Imaging Data Commons (IDC) - Introduction and intital approach

CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECAProject
 

Ähnlich wie Imaging Data Commons (IDC) - Introduction and intital approach (20)

Imaging dearry ncrdc 11062017
Imaging dearry ncrdc  11062017Imaging dearry ncrdc  11062017
Imaging dearry ncrdc 11062017
 
Making DMPs actionable and public
Making DMPs actionable and publicMaking DMPs actionable and public
Making DMPs actionable and public
 
PRISM Project Update
PRISM Project UpdatePRISM Project Update
PRISM Project Update
 
Cloud-Based Solutions for Clinical Data Management
Cloud-Based Solutions for Clinical Data ManagementCloud-Based Solutions for Clinical Data Management
Cloud-Based Solutions for Clinical Data Management
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Big Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationBig Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & Innovation
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
 
ENVRIPLUS Data for Science Theme
ENVRIPLUS Data for Science ThemeENVRIPLUS Data for Science Theme
ENVRIPLUS Data for Science Theme
 
CDKP 2013
CDKP 2013CDKP 2013
CDKP 2013
 
Implementing Open Access: Effective Management of Your Research Data
Implementing Open Access: Effective Management of Your Research DataImplementing Open Access: Effective Management of Your Research Data
Implementing Open Access: Effective Management of Your Research Data
 
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
 
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
 
Data Standards in Radiomics Research
Data Standards in Radiomics ResearchData Standards in Radiomics Research
Data Standards in Radiomics Research
 
IOT-2016 7-9 Septermber, 2016, Stuttgart, Germany
IOT-2016  7-9 Septermber, 2016, Stuttgart, GermanyIOT-2016  7-9 Septermber, 2016, Stuttgart, Germany
IOT-2016 7-9 Septermber, 2016, Stuttgart, Germany
 
H2020 Open Research Data pilot
H2020 Open Research Data pilotH2020 Open Research Data pilot
H2020 Open Research Data pilot
 
The challenges of 3D Personal Data
The challenges of 3D Personal DataThe challenges of 3D Personal Data
The challenges of 3D Personal Data
 
An Introduction to CCDH
An Introduction to CCDHAn Introduction to CCDH
An Introduction to CCDH
 

Mehr von imgcommcall

Agenda NCI Imaging Informatics Webinar 2020-07-06
Agenda NCI Imaging Informatics Webinar 2020-07-06Agenda NCI Imaging Informatics Webinar 2020-07-06
Agenda NCI Imaging Informatics Webinar 2020-07-06
imgcommcall
 

Mehr von imgcommcall (20)

Making Radiology AI Models more robust: Federated Learning and other Approaches
Making Radiology AI Models more robust: Federated Learning and other ApproachesMaking Radiology AI Models more robust: Federated Learning and other Approaches
Making Radiology AI Models more robust: Federated Learning and other Approaches
 
Agenda NCI Imaging Informatics Webinar 2020-07-06
Agenda NCI Imaging Informatics Webinar 2020-07-06Agenda NCI Imaging Informatics Webinar 2020-07-06
Agenda NCI Imaging Informatics Webinar 2020-07-06
 
AI-LAB
AI-LABAI-LAB
AI-LAB
 
PathPresenter
PathPresenterPathPresenter
PathPresenter
 
Agenda April 6, 2020
Agenda April 6, 2020Agenda April 6, 2020
Agenda April 6, 2020
 
Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalon
 
Agenda - NCI Imaging Community Call
Agenda - NCI Imaging Community CallAgenda - NCI Imaging Community Call
Agenda - NCI Imaging Community Call
 
NCI Imaging community call agenda
NCI Imaging community call agendaNCI Imaging community call agenda
NCI Imaging community call agenda
 
CPTAC Data at the Genomic Data Commons
CPTAC Data at the Genomic Data CommonsCPTAC Data at the Genomic Data Commons
CPTAC Data at the Genomic Data Commons
 
CPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data CommonsCPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data Commons
 
Clinical Proteomic Tumor Analysis Consortium (CPTAC) Overview
Clinical Proteomic Tumor Analysis Consortium (CPTAC) OverviewClinical Proteomic Tumor Analysis Consortium (CPTAC) Overview
Clinical Proteomic Tumor Analysis Consortium (CPTAC) Overview
 
PRISM Semantic Integration Approach
PRISM Semantic Integration ApproachPRISM Semantic Integration Approach
PRISM Semantic Integration Approach
 
TCIA Update
TCIA UpdateTCIA Update
TCIA Update
 
Image community 2019 06-03
Image community 2019 06-03Image community 2019 06-03
Image community 2019 06-03
 
NBIA 7.0 Community Version Release
NBIA 7.0 Community Version ReleaseNBIA 7.0 Community Version Release
NBIA 7.0 Community Version Release
 
Image community 2019 05-06
Image community 2019 05-06Image community 2019 05-06
Image community 2019 05-06
 
Image community 2019 03-04
Image community 2019 03-04Image community 2019 03-04
Image community 2019 03-04
 
TCIA Update
TCIA UpdateTCIA Update
TCIA Update
 
Kaleidoscope
KaleidoscopeKaleidoscope
Kaleidoscope
 
Standardized representation of the LIDC annotations using DICOM
Standardized representation of the LIDC annotations using DICOMStandardized representation of the LIDC annotations using DICOM
Standardized representation of the LIDC annotations using DICOM
 

Kürzlich hochgeladen

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Kürzlich hochgeladen (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 

Imaging Data Commons (IDC) - Introduction and intital approach

  • 1. Imaging Data Commons (IDC): Introduction and initial approach Andrey Fedorov, PhD, on behalf of the IDC consortium Brigham and Women’s Hospital / Harvard Medical School andrey.fedorov@gmail.com Oct 7, 2019 - NCI Imaging Community call Slides location: http://bit.ly/2019-idc-imgcommcall
  • 2. 2 “The NCI Imaging Data Commons will be a cloud-based resource that connects researchers with 1. cancer image collections 2. a robust infrastructure that contains imaging data, subject and sample metadata and experimental metadata from disparate sources 3. resources for searching, identifying and viewing images, and 4. additional data types contained in other Cancer Research Data Commons nodes.” Cancer Research Data Commons (CRDC) Imaging Data Commons (IDC)
  • 3. IDC timeline to the award ● April 2018: request for information on IDC development ● December 11, 2018: IDC solicitation for competitive proposals released ● January 31, 2019: IDC proposals due date ● July 24, 2019: IDC contract mutually signed between Leidos Biomed and BWH 3
  • 4. IDC: The team ● Leadership: ○ Ron Kikinis*, Principal Investigator ○ Andrey Fedorov, Technical lead and project manager ● Imaging R&D: ○ BWH: Ron Kikinis, Andrey Fedorov, Hugo Aerts ○ Isomics Inc: Steve Pieper ○ Radical Imaging: Rob Lewis, Erik Ziegler ○ Pixelmed: David Clunie (aka “Mr. DICOM”) ○ Fraunhofer MEVIS: André Homeyer ● Cloud: ○ Institute for Systems Biology: Bill Longabaugh ● Security: ○ General Dynamics IT: David Pot * Ron Kikinis is 50/50 between BWH and MEVIS at the moment, 100% at BHW effective March 2020 4 Bill LongabaughRon Kikinis Andrey Fedorov Steve Pieper David Clunie Hugo Aerts David Pot André HomeyerRob Lewis
  • 5. Team background ● Open source image computing technology ○ 3D Slicer, OHIF, pyradiomics, dcmqi ● Network projects and collaborations ○ Quantitative Imaging Network (QIN) ○ Informatics Technology for Cancer Research (ITCR) ● Collaborations with TCIA ○ Data harmonization efforts (segmentations, LIDC) ● DICOM development ○ Refinement of the standard to address quantitative imaging use case ○ Tools, outreach, industry collaborations ● Cloud infrastructure development ○ ISB-CGC one of the 3 Cancer Genomics Cloud pilots 5
  • 6. Understanding the problem 6 CrowdFlower 2016 Data Science Report ● Imaging data = images + annotations + clinical data ( + analysis results) ○ Current community focus is on images ○ Data preparation is very time consuming ○ Limited effort to support “organic growth” of the datasets ● Need multi-site, multi-reader, multi-tool, representative cohorts ○ Cannot be done without conventions for data representation ○ Requires tools to support harmonization efforts ○ Hard if not impossible to do retrospectively ● Need harmonization for ○ Semantics, data representation, communication interfaces ● CRDC takes those problems at new levels
  • 7. ● First priority: resource useful for imaging researchers ● Opportunity to address limitations and develop missing components ○ Visualization, search, data harmonization (where “data” is not limited to “images”!) ● Empower imaging research with metadata ○ Harmonize imaging, image derived and image related data ○ Provenance ○ Search ○ FAIR ● Initial goals: radiology and pathology ● Simplify accessibility of the already popular tools ● Simplify analysis workflows ● Lead by example: development of use cases is part of the project plan ● Longer term: cross-node integration IDC vision 7
  • 8. IDC and The Cancer Imaging Archive (TCIA) ● “TCIA is a service which de-identifies and hosts a large archive of medical images of cancer accessible for public download.” ● Mostly clinical imaging data (radiology and digital pathology) ● IDC will: ○ Be part of the CRDC ecosystem, with the objective to support cross-domain data integration ○ Utilize public collections of TCIA to populate first wave of content, and will host public collections of the TCIA going forward ○ Rely on TCIA for image de-identification ○ Collaborate with TCIA on the topics of data harmonization and development of tools of shared interest ○ Encourage users to compute on the cloud by providing various incentives (e.g., compute credits) ○ Discourage downloads of the data by not offsetting the data egress charges 8
  • 9. IDC implementation - guiding principles ● Agile development approach ○ Broad initial direction, adaptive ○ Customer involvement ○ Short development cycles - sprints ● Phased implementation plan ○ Phase I: Pre Minimal Viable Product ○ Phase II: Minimal Viable Product ○ Phase III: Production / Further development ○ Phase IV: Further development / Maintenance ● IDC will host only public datasets, will NOT de-identify your data! ● Non-restrictive open source license for everything produced by IDC ● “Bag of tools” instead of monolithic development from scratch ● Standards-based 9
  • 10. Phase I: Pre Minimal Viable Product RFP-defined broad tasks: ● Definition of imaging data model, data dictionaries and ontologies ● Initial dataset and use-case definition with cross-collection data access ● Image data, metadata and file standards ● Evaluation of existing software and tools for reuse ● IDC advisory committee selection 10 Target completion: October 2019
  • 11. IDC data backbone: DICOM ● Digital Imaging and Communication in Medicine (DICOM) is the standard for communication of medical imaging information and related data ○ emphasis on metadata standardization ○ compatibility with acquisition and archival tools ○ images (CT, MR, whole slide pathology) and analysis results (annotations, segmentation, registration, quantification) ○ interoperability ○ coordinated with other standards (HL7/FHIR, JSON, XML, REST, WADO, BRIDG, SNOMED) ● History of development and adoption since 1983 ● Adopted by virtually all manufacturers of medical imaging equipment ● Open international community of stakeholders ● DICOM is a live standard! ● IDC opportunities: raise awareness, create incentives, help transitioning 11
  • 12. DICOM for data modeling ● DICOM files: combine attributes from several real-world entities ○ Patient ○ Equipment ○ Modality-specific attributes ● Tables of attributes based on modules ○ Support incremental growth of content ● Unique identifiers ● Specific tasks: ○ Machine-readable representation ○ Data model search interface ○ Performance evaluation ○ Tools http://dicom.nema.org/medical/dicom/current/output/chtml/part03/chapter_A.html DICOM Composite Instance IOD Information Model 12
  • 13. ● Radiology images ○ Bonus: multi-frame representation ● Pathology images ○ Converters ○ Capabilities ● Image-derived data ○ Annotations, parameter maps, qualitative evaluations ○ Radiology, pathology ● Other image types: open question ● Opportunities: ○ The knowns: converters, ontologies, capabilities, learning resources, missing data types ○ The unknowns: we have to be ready and nimble to leverage those as they come up DICOM gaps assessment 13
  • 14. Clinical data ● Basic clinical data: DICOM composite context ● Treatment, diagnosis, ... ○ Excel spreadsheets? ○ One of a kind example: DICOM SR for QIN-HEADNECK ● What can be reconciled and harmonized - open question ● Approach: ○ Coordination with CRDC-wide resource: Cancer for Cancer Data Harmonization (CCDH) ○ CDISC BRIDG: unifying model for clinical and research domains (harmonized with DICOM) 14 https://www.cdisc.org/standards/domain-information-module/bridg
  • 15. Key collaborators: Erik Ziegler, Trinity Urban, Gordon Harris (OHIF), Markus Herrmann (BWH/MGH CCDS) Image viewer: OHIF ● Browser-based (zero install!) ○ Open source, modern Javascript ○ DICOM standard images, segmentations, annotations ○ Professional design ● DICOMweb supported by Google, Siemens, and open source servers ● VTK.js WebGL visualization ● Pathology plugin development ○ DICOM Whole Slide Imaging ○ Efficient DICOMweb pyramid access 15
  • 16. Annotation example: Crowds Cure Cancer ● Expert annotation of cancer images from TCIA ● Booth at RSNA 2017, 2018 (and 2019) ● Built on OHIF, react, dcm4chee, AWS ● Desktop and Mobile ● > 5,000 measurements collected ● Help out at crowds-cure.org! 16 Key collaborators: Erik Ziegler, Trinity Urban, Dan Rukas, Gustavo Lelis, Jayashree Kalpathy-Cramer, Gordon Harris, Fred Prior, Justin Kirby, and more...
  • 17. Institute for Systems Biology - Cancer Genomic Cloud (ISB-CGC) ● One of three Cancer Genomics Cloud Pilots, starting in September 2014 ● Since October 2017, ISB-CGC is an NCI Cloud Resource (CR) component of the NCI Cancer Research Data Commons (CRDC) ● As a Cloud Pilot, ISB-CGC built a platform that hosted and managed controlled-access data stored in Google Cloud Storage buckets ○ This role is now being performed by the Genomic Data Commons (GDC) and the Data Commons Framework (DCF) ○ ISB-CGC now uses Fence for handling A&A, linking Google IDs to eRA Commons IDs to provide ISB-CGC users with access to controlled data 17https://isb-cgc.org/
  • 18. Cloud platform ● Our existing ISB-CGC Web Application and API production code base can be extensively reused and leveraged, and provides a low-risk path to stand up the IDC minimum viable product (MVP) quickly ● Our knowledge of the existing CRDC ecosystem and roadmap will guide architectural decisions ● Google is already providing imaging datasets 18
  • 20. Pilot support of image viewing in ISB-CGC 20 ISB-CGC Web Application prototyped integration a pathology viewer (using caMicroscope -> transitioning to OHIF) and a radiology viewer (using OHIF Viewer) for TCGA data:
  • 21. Google Healthcare ● Google Cloud is the platform used for ISB-CGC ● Google initiated work with OHIF and PixelMed ○ Google engineers have contributed Google Cloud support to OHIF ○ DICOMweb protocols ● Google hosts TCIA images ● BigQuery tools for extracting and interrogating DICOM metadata ● Authentication, data security, compute, GPU, notebooks … 21
  • 22. Datasets ● De-identification and curation: TCIA ● TCGA ○ Radiology ■ 1731 cases, 3022 DICOM studies, 20317 DICOM series ○ Pathology ■ 11007 cases, 11963 diagnostic images, 18304 frozen tissue images ○ Available in ISB-CGC ● Most public TCIA datasets are already replicated on Google Healthcare ○ Digital pathology excluded 22
  • 23. Other sources of data 23 ● IDC is not intended to be limited to radiology and pathology! ○ 3D atlases of the cellular, morphological, molecular features of human cancers over time ● Human Tumor Atlas Network (HTAN) ○ Close coordination with David Gutman ○ IDC’s Bill Longabaugh is a member of HTAN ● CPTAC Imaging in TCIA: potential proteomics use case ● Clinical trial groups (e.g., ECOG-ACRIN) ● Pharma datasets slated for public release through research projects https://www.cancer.gov/research/key-initiatives/moonsh ot-cancer-initiative/implementation/human-tumor-atlas
  • 24. Approach: Analytics / applications ● Goals: ○ empower researchers to do better science (integrative, larger, faster, rigorous, traceable, enable comparative studies) ○ metadata in - metadata out ● Computational workflows applied to large datasets ○ cover radiomics, pathomics, and genomics ○ integration with containerized computational tool ● Demonstrate capabilities by implementing representative use cases ○ in coordination with domain experts ○ batch processing tools, user-guided when needed ○ deep learning and engineered technologies ● Initial focus: reproduce previously published studies ● Later stages: investigate novel aspects of the data 24
  • 25. Approach: Applications - radiomics ● Build on numerous studies based on TCIA datasets ○ Including those integrating imaging and genomics data ● Engineered and deep learning ● Considered use cases ○ Correlative analyses ○ Prognostication ○ Imaging-genomic studies 25
  • 26. Approach: Applications - pathomics ● Academic studies + industry grade pathomics tools ● Opportunity for open source tool development ● Use cases considered ○ correlating texture or shape features derived from pathology images with malignancy or survival ○ correlating texture and shape features of cellular structures with different end points (histological grade, clinical stage, metastasis, lymph node spread, survival) 26 From: Yu et al. 2016. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun. Tile-based steatosis quantification. Homeyer et al. Focused scores enable reliable discrimination of small differences in steatosis. Diagn. Pathol. 13, 76 (2018).
  • 27. Approach: Applications - radiomics + pathomics Evaluate potential links between ● radiomics quantifying radiographic information, including macroscopic heterogeneity ● pathomics signatures characterizing the immune responses ● genetic markers ● clinical information and outcomes 27 Grossmann et al. 2017. Defining the biological basis of radiomic phenotypes in lung cancer. Elife Saltz et al. 2018. Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images. Cell Rep.
  • 28. Governance ● IDC as an NCI contract to the Frederick National Labs for Cancer Research (or Leidos Biomed.) ● Todd Pihl, PhD as FNLCR Program Manager ● Keyvan Farahani, PhD as NCI Program Director ● Monthly reporting to Leidos Biomedical ● Weekly stakeholder meetings ● IDC Advisory Committee (TBD) to provide “guidance on IDC scope, direction, and other governance issues including what datasets the IDC should incorporate. This group will be composed of extramural experts in cancer imaging, related technologies and NCI driving projects" 28
  • 29. Phase II: Minimal Viable Product RFP-defined broad tasks: ● Implementation of Gen3 integration (Fence, IndexD) ● Demonstration of IDC portal ● Cloud installation of and access to TCGA and one other collection ● Demonstration of viewer implementation for radiological images ● Demonstration of artificial cohort generation and identification ● Cross-cloud provider interoperability and standards ● User testing ● Outreach to and input from imaging and other cancer research communities ● [Support of continuous] Availability 29 Target completion: late Summer 2020
  • 30. Reuse of ISB-CGC Codebase ● ISB-CGC (in the initial Cloud Pilot phase) was originally developed to handle storing the data, finding the data, and computing on it ● The pieces of ISB-CGC that were built for the first two roles in the original pilot phase are ideal for reuse to set up the IDC ● In the current CRDC ecosystem, the roles and functionality of the Cloud Resources (e.g. the current version of ISB-CGC), the Cancer Data Aggregator (CDA), and data nodes in such as the IDC are distributed: ○ Compute is done in Cloud Resources using data and e.g. Dockerized tools make available by the IDC ○ Cohort creation involving multiple data types (i.e. Pan-*DC search) is implemented in higher layers (e.g. the resources, using the CDA to search across nodes 30
  • 31. Outreach strategy ● Web presence (website, GitHub, mail list, Slack?) ● Interactive demonstration and learning resources: Jupyter Notebook / Google Collab, workspaces, integration with viewers ● Publications accompanied by datasets and computational workspaces, data descriptor publications ● Crowd-sourced annotation / analysis ● Connectathons? ● Outreach and coordination with vendors ● Tutorials at the major conferences: RSNA, MICCAI, SPIE (resources allowing!) 31 https://github.com/ImagingDataCommons @CancerIDC
  • 32. Prior examples of outreach activities at BWH 32 https://projectweek.na-mic.org/ https://dicom4qi.readthedocs.io/ http://qiicr.org/dicom4miccai/ https://discourse.slicer.org/
  • 33. Phase III: Production / Further development RFP-defined broad tasks: ● ATO, FISMA compliance ● User engagement / help desk ● Work with CRDC for cross-node searching ● Demonstration of digital pathology viewer tool and other visualization ● Incorporation of additional image collections ● Support of derived datasets ● Interoperability with workspaces and cloud resources 33 Target completion: late May 2021
  • 34. Security ● Initially, all imaging data is planned to be de-identified, therefore open access ● Federal Information Security MAnagement Act (FISMA) Low security to get Authority to Operate (ATO) ● Since design of system based on ISB-CGC (FISMA Moderate), much re-use of security approach (and documentation) planned ● TCIA for data de-identification - no PHI data on IDC! 34
  • 35. Digital pathology ● OHIF Viewer for visualization of images and annotations ● DICOM supports digital pathology ○ Including extensive specimen metadata ● DICOM pathology annotation capabilities will need development ● Converters ● Markus Herrmann, BWH/MGH Center for Clinical Data Science (CCDS) - IDC key collaborator for the digital pathology use case 35
  • 36. Approach for image-derived data ● Use standard DICOM objects ○ Segmentations, measurements, annotations, parametric maps, ... ○ Numerous examples for radiology use cases ○ Pathology will require development ○ Other image types will need to be prioritized ● Improve/develop conversion tools ● Documentation, use cases to encourage and support adoption ● Derived data submission procedures ● Search interface features and data modeling considerations 36
  • 37. Phase IV: Further development / Maintenance RFP-defined broad tasks: ● Interaction with tool repositories ● Continued access and coordination of collections ● Help desk continuation ● Community engagement 37 Target completion: July 2023
  • 38. FAQ (based on emails/questions so far) ● IDC vs TCIA - hopefully we covered this earlier ● What are your plans for image viewing and annotation? ○ OHIF; Image viewing and annotation visualization in MVP ● What image data will IDC be hosting besides TCIA? ○ Image data from high research value biomedical imaging projects generating public datasets ○ To be determined in coordination with the IDC Advisory Committee ● Will you be providing an API to IDC for accessing the images and annotations and will you be supporting DICOMweb? ○ Yes ● Going forward when people contribute biomedical datasets, where do they go — to TCIA or IDC? Or will data contributed to TCIA automatically go to IDC? ○ New imaging data should be submitted TCIA, and will be pulled into IDC post-curation ● Once we do analyses, we will generate image metadata, those could be contributed to the community via IDC, will IDC will accept those? ○ Yes, our desire is to make the process of contributing those back as seamless as possible 38
  • 39. Significance beyond Data Commons ● Scientific reproducibility and cloud/standards/containerized analysis as a components of the solution ● Routine generation of standardized data ● Raise awareness of the value of metadata, introduce tools to enable its collection and use ● Opportunities to engage and integrate various groups of stakeholders (industry, clinical trial groups, pharma, researchers, clinicians) ● We believe tools developed can be applicable for establishing private “mini commons” 39
  • 40. Dedication 40 Ed Helton 1945 - 2019 Associate Director of Clinical Trials Programs and Products, NCI Lawrence (Larry) Clarke 1944 - 2016 Chief of the Image Technology Development Branch, NCI
  • 41. We are hiring! Apply here: http://bit.ly/2019-IDC-BWH-job 41