SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Cell Line Metadata in ArxLab
• Create ArxLab Registration objects to
parental cell lines with minimal common
metadata
• Create ArxLab Registration entries for
project-specific daughter cell lines with
additional project-specific metadata
• Standardize where possible on ArxLab
Assay definitions
Why this is a Problem
• Lack of a common practicesplatform
inhibits collaboration between
groups since they have to rely on
external sources to know what internal
research has been done on a cell line
• When there is collaboration, e.g., with
one group supplying cell lines and data
to another group, may have issues with
updating metadata, e.g., primary site
change
• Lack of a common vocabulary leads to
data quality issues, e.g., what do you
mean by Doubling Time
• Velocity of scientific discovery is
slower as a result
Challenge • One of the key challenges in conducting research in a diverse and dynamic organization like the Broad
Institute is connecting islands of related data.
• Since scientific groups have traditionally been separated from each other, relying on each other as internal suppliers and
customers, their data have similarly been separated; it is not uncommon to have two groups working on the same cell line but
have no means of finding out about each other's work, partially due to different means of tracking cell-line data
• The Broad Institute has collaborated with Arxspan to develop a configuration of ArxLab to share a common registry of parental
cell lines, allowing different groups to have a common vocabulary about cell lines and opening collaboration possibilities for both
new science and accelerated progress on existing science
Solution Framework
• Use institutional
database as the
canonical source of
cell line metadata
• Ingest institutional data into local
data management
systems to link
project specific data
to parental cell line data
• Have a common registry of parental
cell lines (available to all) and
daughter cell lines (project specific
by default)
• Preserve heredity of cell lines and
allow searching by such
Example
• What metadata tracked at what level?
• Who decides the metadata categories
and values?
• How do we promote project-specific
metadata to parental cell lines?
Desired State
• Common cell line metadata categories
and data
• Defined, published, flexible processes
for collaborative reviewapproval of
metadata categories and data (e.g.,
intake, change, promotion)
• Retain ability for groups to work
independently on project-specific
metadata and data
• Technology that enables wide-spread
sharing of cell-line metadata categories
and data, inside and outside Broad
Hypothesis
• Use best practices from
manufacturing around
master data management
(e.g., Master Data Review board) to
build necessary organizational
practices
• Use technology to enable
organization processes
• Principles:
o Technology without
organizational
processes is a waste
o Organizational processes without
enabling, sustainable use of
technology will wither
Institutional Cell Line Database
Sample Entity Relationship Diagram
• Tracks multiple names and
annotations (e.g., lineage) and
the source of these claims
• Has no concept of samples or
instances (annotates the
abstract entity only)
• cell_sample: Name space
for a cell line name, e.g., CCLE,
CDDB, ATCC
Enabling Cross-Group
Collaboration on Cell Lines
Data exchange via Java Script Object Notation (JSON) file:
cell_sample = { cell_sample_names: [
{cell_name_type: "CCLE", cell_sample_name: "A375_SKIN"},
{cell_name_type: "cddb", cell_sample_name: "30"},
{cell_name_type: “ATCC", cell_sample_name: "ATCC: A-375 [A375] (ATCC® CRL-1619™)“} ] }
• cell_name_type: Name for
cell line and internal priority of
that name, e.g., may prefer one
name to another name
• cell_sample_name: array of
names for a cell line, e.g.,
o CCLE: A375_SKIN
o CDDB: 30
o ATCC: A-375 [A375] (ATCC®
CRL-1619™)
Bruce Kozuma, PMP, CPIM
Broad Institute
bkozuma@broadinstitute.org
Current State
• Multiple groups creating and using cell
lines at the Broad, e.g., Achilles, PRISM,
Cancer Cell Line Encyclopedia (CCLE)
• Some canonical sources of cell-line
data at Broad, e.g., Cancer Cell Line
Dependencies Database (CDDB)
• However!
o Limited coordination in definitions
of what constitutes a unique cell line
and how changes are made to that
definition over time
o No effective mechanisms to curate,
register, or search such definitions
o No automated refresh cycle for data
in CDDB
Credits
CDD Data Curation
Paul Clemons
Mahmoud Ghandi
Shuba Gopal
Gregory Gydush
Barbara Weir
Achilles
Francesca Vazquez
Sasha Pantel
Nicole Dabkowski
Phil Montgomery
Glenn Cowley
PRISM
Chris Mader
Jen Roth
Sam Bender
Massami Laird
Ed McBride
Broad Management
Alex Burgin
Anthony Philippakis
Scott Sutherland
BITS
Chris Dwan
Eric Jones
Arxspan
Jeff Carter
Kate Hardy

Weitere ähnliche Inhalte

Was ist angesagt?

Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
Stuart Chalk
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 

Was ist angesagt? (20)

A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
Crosslinks
Crosslinks Crosslinks
Crosslinks
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAnalyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index Designs
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
Bh14 ogo
Bh14 ogoBh14 ogo
Bh14 ogo
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 

Ähnlich wie 2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1

The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collection
Valery Tkachenko
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
Data Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptxData Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptx
TusharAgarwal49094
 

Ähnlich wie 2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1 (20)

2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Data Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data LakesData Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data Lakes
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collection
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
1650607.ppt
1650607.ppt1650607.ppt
1650607.ppt
 
Use a data parallel approach to proAcess
Use a data parallel approach to proAcessUse a data parallel approach to proAcess
Use a data parallel approach to proAcess
 
Data Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptxData Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptx
 
CS3270 - DATABASE SYSTEM - Lecture (2)
CS3270 - DATABASE SYSTEM - Lecture (2)CS3270 - DATABASE SYSTEM - Lecture (2)
CS3270 - DATABASE SYSTEM - Lecture (2)
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1

  • 1. Cell Line Metadata in ArxLab • Create ArxLab Registration objects to parental cell lines with minimal common metadata • Create ArxLab Registration entries for project-specific daughter cell lines with additional project-specific metadata • Standardize where possible on ArxLab Assay definitions Why this is a Problem • Lack of a common practicesplatform inhibits collaboration between groups since they have to rely on external sources to know what internal research has been done on a cell line • When there is collaboration, e.g., with one group supplying cell lines and data to another group, may have issues with updating metadata, e.g., primary site change • Lack of a common vocabulary leads to data quality issues, e.g., what do you mean by Doubling Time • Velocity of scientific discovery is slower as a result Challenge • One of the key challenges in conducting research in a diverse and dynamic organization like the Broad Institute is connecting islands of related data. • Since scientific groups have traditionally been separated from each other, relying on each other as internal suppliers and customers, their data have similarly been separated; it is not uncommon to have two groups working on the same cell line but have no means of finding out about each other's work, partially due to different means of tracking cell-line data • The Broad Institute has collaborated with Arxspan to develop a configuration of ArxLab to share a common registry of parental cell lines, allowing different groups to have a common vocabulary about cell lines and opening collaboration possibilities for both new science and accelerated progress on existing science Solution Framework • Use institutional database as the canonical source of cell line metadata • Ingest institutional data into local data management systems to link project specific data to parental cell line data • Have a common registry of parental cell lines (available to all) and daughter cell lines (project specific by default) • Preserve heredity of cell lines and allow searching by such Example • What metadata tracked at what level? • Who decides the metadata categories and values? • How do we promote project-specific metadata to parental cell lines? Desired State • Common cell line metadata categories and data • Defined, published, flexible processes for collaborative reviewapproval of metadata categories and data (e.g., intake, change, promotion) • Retain ability for groups to work independently on project-specific metadata and data • Technology that enables wide-spread sharing of cell-line metadata categories and data, inside and outside Broad Hypothesis • Use best practices from manufacturing around master data management (e.g., Master Data Review board) to build necessary organizational practices • Use technology to enable organization processes • Principles: o Technology without organizational processes is a waste o Organizational processes without enabling, sustainable use of technology will wither Institutional Cell Line Database Sample Entity Relationship Diagram • Tracks multiple names and annotations (e.g., lineage) and the source of these claims • Has no concept of samples or instances (annotates the abstract entity only) • cell_sample: Name space for a cell line name, e.g., CCLE, CDDB, ATCC Enabling Cross-Group Collaboration on Cell Lines Data exchange via Java Script Object Notation (JSON) file: cell_sample = { cell_sample_names: [ {cell_name_type: "CCLE", cell_sample_name: "A375_SKIN"}, {cell_name_type: "cddb", cell_sample_name: "30"}, {cell_name_type: “ATCC", cell_sample_name: "ATCC: A-375 [A375] (ATCC® CRL-1619™)“} ] } • cell_name_type: Name for cell line and internal priority of that name, e.g., may prefer one name to another name • cell_sample_name: array of names for a cell line, e.g., o CCLE: A375_SKIN o CDDB: 30 o ATCC: A-375 [A375] (ATCC® CRL-1619™) Bruce Kozuma, PMP, CPIM Broad Institute bkozuma@broadinstitute.org Current State • Multiple groups creating and using cell lines at the Broad, e.g., Achilles, PRISM, Cancer Cell Line Encyclopedia (CCLE) • Some canonical sources of cell-line data at Broad, e.g., Cancer Cell Line Dependencies Database (CDDB) • However! o Limited coordination in definitions of what constitutes a unique cell line and how changes are made to that definition over time o No effective mechanisms to curate, register, or search such definitions o No automated refresh cycle for data in CDDB Credits CDD Data Curation Paul Clemons Mahmoud Ghandi Shuba Gopal Gregory Gydush Barbara Weir Achilles Francesca Vazquez Sasha Pantel Nicole Dabkowski Phil Montgomery Glenn Cowley PRISM Chris Mader Jen Roth Sam Bender Massami Laird Ed McBride Broad Management Alex Burgin Anthony Philippakis Scott Sutherland BITS Chris Dwan Eric Jones Arxspan Jeff Carter Kate Hardy