SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Enabling Cross-Group
Collaboration on Cell Lines
via Arxspan's ArxLab
2016/04/06, v1
Authors
• Bruce Kozuma is a projectprogram manager in the Broad
Information Technology Services (BITS) department with
experience in software development, operations, and IT in
industries such as manufacturing, telecommunications,
biotechnology, and biomedical research.
• Paul Clemons is director of computational chemical biology
research in the Center for the Science of Therapeutics
(CSofT) at the Broad Institute. He and his team use
quantitative measurement, computational, and
visualization techniques to enable systematic use of small
molecules to explore biology, especially disease biology.
2
About the Broad Institute
• A collaborative community
pioneering a new model of
biomedical science; views itself as
an experiment in a new way of
doing science, empowering
researchers to:
– Act nimbly
– Work boldly
– Share openly
– Reach globally 3
Current Cell Line
Management State
• Multiple groups creating and using cell lines at the Broad, e.g.,
– Project Achilles, Profiling Relative Inhibition Simultaneously in Mixtures
(PRSIM), Cancer Cell Line Encyclopedia (CCLE), Center for the Science of
Therapeutics (CSofT), Connectivity Map (CMAP), Center for the
Development of Therapeutics (CDoT)
• Some canonical sources of cell-line data at Broad, e.g.,
– Cancer Cell Line Dependencies Database (CDDB)
• However!
– Limited coordination in definitions of what constitutes a unique cell line and
how changes are made to that definition over time
– No effective mechanisms to curate, register, or search such definitions
– No automated refresh cycle for data in CDDB
4
Why is this a Problem?
• Lack of a common platform inhibits collaboration between
groups since they have to rely on external sources to know
what internal research has been done on a cell line
• When there is collaboration, e.g., with one group supplying
cell lines and data to another group, may have issues with
updating metadata, e.g., primary site change
• Lack of a common vocabulary leads to data quality issues,
e.g., what do you mean by Doubling Time
• Velocity of scientific discovery is slower as a result
5
Practical examples
6
• What metadata is
tracked at what level?
• Who decides the
metadata categories and
values?
• How do we promote
project-specific
metadata to parental
cell lines?
Practical examples
• Who decides two or more cell lines are the same thing?
– Example: A375 and unknown cell line
– Heuristic: They are the same cell line if they have the same genomic
fingerprint and same source (e.g., individual and tissue type) –
more measurements of sameness to be added later
7
Desired Situation
• Common cell line metadata categories and data
• Defined, published, flexible processes for collaborative
reviewapproval of metadata categories and data (e.g.,
intake, change, promotion)
• Retain ability for groups to work independently on project-
specific metadata and data
• Technology that enables wide-spread sharing of cell-line
metadata categories and data, inside and outside Broad
8
Cell Lines Metadata
9
Hypothesis: Manufacturing Practices
& Appropriate Technology Can Help
• Use best practices from manufacturing around
master data management to build necessary
organizational practices
• Use technology to enable organization practices
• Principles:
– Technology without organizational practices is a waste
– Organizational practices without enabling, sustainable
use of technology will wither
10
Cell Line Master Data Review Board
• Establish a cell line master data review board to
review metadata categories and data
stewardshipmanagement practices
– Draws from Material Review Boards in manufacturing
– Provides a forum for a “coalition of the willing” to come
to consensus about metadata categories before
categoriesvalues are established and curate changes
to before making them
– Provides institutional sponsorship, above the level of
individual projects, while being collaborative
11
Cell Line Master Data Review Board
Proposed SponsorshipMembership
• Office of the Chief Data Officer sponsors the board to
provide cross-project arbitration
• Initial membership by organization:
– Office of the Chief Data Officer
– Office of the Chief Science Officer
– Developer of the institutional database e.g., CSofT
– Projects creating cell lines and metadata, e.g., PRISM, Achilles
– Groups ingesting cell line metadata, e.g., Proteomics, CDoT
– BITS as facilitator (works across organization, neutral about science)
– Ad hoc members 12
Cell Line Master Data Review Board
Proposed PoliciesProcedures
• Board mechanics: Governance, changes to membership, etc.
• Develop canonical source of parental cell line definition
– Assumes can use existing metadata categories and values
• Initial methods
13
• Register new cell lines
• Add new metadata categories
• Add new metadata to existing
categories
• Change metadata categories or
values for existing cell lines
• Track provenance of names and
annotations (differences left to end
users to resolve)
Framework for Sharing Cell Line
Metadata
• Use institutional database as the canonical source of cell line
metadata
• Provide means of ingesting institutional data into local data
management systems to link project specific data to
parental cell line data
• In the local data management system, have a common
registry of parental cell lines (available to all) and daughter
cell lines (project specific by default)
• Preserve heredity of cell lines and allow searching by such
14
Institutional Cell Line Database
Sample Entity Relationship Diagram
15
• Tracks multiple names and annotations (e.g., lineage) and
the source of these claims
• Has no concept of samples or instances (annotates the
abstract entity only)
Data exchange via Java Script Object Notation (JSON) file:
cell_sample = {
cell_sample_names: [
{cell_name_type: "CCLE",
cell_sample_name: "A375_SKIN“},
{cell_name_type: "cddb",
cell_sample_name: "30"},
{cell_name_type: “ATCC",
cell_sample_name: "A-375 [A375]
(ATCC® CRL-1619™)"}]
}
Institutional Cell Line Database
Sample Data Exchange Mechanism
16
• cell_sample: Name space for a cell
line name, e.g., CCLE, CDDB, ATCC
• cell_name_type: Name for a cell line
and internal priority of that name,
e.g., may prefer one name to another
name
• cell_sample_name: array of names
for a cell line, e.g.,
– CCLE: A375_SKIN
– CDDB: 30
– ATCC: A-375 [A375] (ATCC® CRL-1619™)
Local Data Management System:
Laboratory Data Management (LDM)
• Project for BITS to provide centrally-managedsupported
solutions for management of laboratory data, divided into
functions:
– Data capturearchive (instruments and other sources)
– Container inventoryregistration (chemical,
biological, hybrid)sample management
– Core Electronic Laboratory Notebook (ELN, experiment
documentationIP protectionlinking to data)
– Dataworkflow management
– Data analysisvisualization 17
Cell Line Metadata and Data in LDM
Lucidchart - Diagrams Done Right
Cell Line Metadata and Data in ArxLab
1919
Next Steps for Sharing Cell Line
Metadata
• Work out data privacyclassification restrictions
• Phased implementation for sharing data from institutional database cell
line database with external systems like ArxLab
– Phase 1: Import static list (e.g., JSON file) of parental cell lines (~117K) and
synonyms into ArxLab Registration with type-ahead to auto complete
names, e.g., A37 shows A375
– Phase 2: Add resolution of entered names to a common cell line ID and
preferred name, e.g., entering A375_SKIN resolves to A375 upon entry
– Phase 3: Automatic update LDM via periodic push from institutional cell line
database, including setting up legal framework for data distribution
20
Acknowledgements
Achilles
Francesca Vazquez
Sasha Pantel
Nicole Dabkowski
Phil Montgomery
Glenn Cowley
PRISM
Chris Mader
Jen Roth
Sam Bender
Massami Laird
Ed McBride
21
CDDB Data Curation
Paul Clemons
Mahmoud Ghandi
Shuba Gopal
Gregory Gydush
Barbara Weir
Broad Management
Alex Burgin
Anthony Philippakis
Scott Sutherland
Broad Information
Technology Services
Chris Dwan
Eric Jones
Arxspan
Jeff Carter
Kate Hardy
Background slides
22
Summary – Background
• One of the key challenges in conducting research in a diverse and dynamic
organization like the Broad Institute is connecting islands of related data
• Since scientific groups have traditionally been separated from each other,
relying on each other as internal suppliers and customers, their data have
similarly been separated; it is not uncommon to have two groups working on the
same cell line but have no means of finding out about each other's work, partially
due to different means of tracking cell-line data
• The Broad Institute has collaborated with Arxspan to develop a configuration of
ArxLab to share a common registry of parental cell lines, allowing different
groups to have a common vocabulary about cell lines and opening
collaboration possibilities for both new science and accelerated progress on
existing science
23
What You Can Gain – Background
• Gain insight into how the Broad solved a common and intransigent
issue facing a variety of diverse organizations using cloud-based,
current-generation laboratory data-management software in a manner
that can be reapplied in a variety of situations
• See how different departments within the Broad worked
collaboratively with Arxspan to solve this issue in a horizontal manner,
i.e., differently from either a bottom up or top down approach
• Shows how existing technology can be extended in demanding
scientific environments to solve long-standing collaboration issues
within a leading biomedical research organization
24

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth
 
BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013
Andrea de Souza
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
drnigam
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 

Was ist angesagt? (20)

Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open Science
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
Dk net webinar tutorial pen
Dk net webinar tutorial penDk net webinar tutorial pen
Dk net webinar tutorial pen
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
 

Ähnlich wie 2016 Bio-IT World Cell Line Coordination 2016-04-06v1

2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
Michael Atkins
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
Yatpang Cheung
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Sarah Shreeves
 

Ähnlich wie 2016 Bio-IT World Cell Line Coordination 2016-04-06v1 (20)

2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v12016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
 
How to share useful data
How to share useful dataHow to share useful data
How to share useful data
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Met soc15 roccaserra-biocrates-datasharing
Met soc15 roccaserra-biocrates-datasharingMet soc15 roccaserra-biocrates-datasharing
Met soc15 roccaserra-biocrates-datasharing
 
Data Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data LakesData Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data Lakes
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
 
Cytoscape Network Visualization and Analysis
Cytoscape Network Visualization and AnalysisCytoscape Network Visualization and Analysis
Cytoscape Network Visualization and Analysis
 
Enhancing Our Capacity for Large Health Dataset Analysis
Enhancing Our Capacity for Large Health Dataset AnalysisEnhancing Our Capacity for Large Health Dataset Analysis
Enhancing Our Capacity for Large Health Dataset Analysis
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

2016 Bio-IT World Cell Line Coordination 2016-04-06v1

  • 1. Enabling Cross-Group Collaboration on Cell Lines via Arxspan's ArxLab 2016/04/06, v1
  • 2. Authors • Bruce Kozuma is a projectprogram manager in the Broad Information Technology Services (BITS) department with experience in software development, operations, and IT in industries such as manufacturing, telecommunications, biotechnology, and biomedical research. • Paul Clemons is director of computational chemical biology research in the Center for the Science of Therapeutics (CSofT) at the Broad Institute. He and his team use quantitative measurement, computational, and visualization techniques to enable systematic use of small molecules to explore biology, especially disease biology. 2
  • 3. About the Broad Institute • A collaborative community pioneering a new model of biomedical science; views itself as an experiment in a new way of doing science, empowering researchers to: – Act nimbly – Work boldly – Share openly – Reach globally 3
  • 4. Current Cell Line Management State • Multiple groups creating and using cell lines at the Broad, e.g., – Project Achilles, Profiling Relative Inhibition Simultaneously in Mixtures (PRSIM), Cancer Cell Line Encyclopedia (CCLE), Center for the Science of Therapeutics (CSofT), Connectivity Map (CMAP), Center for the Development of Therapeutics (CDoT) • Some canonical sources of cell-line data at Broad, e.g., – Cancer Cell Line Dependencies Database (CDDB) • However! – Limited coordination in definitions of what constitutes a unique cell line and how changes are made to that definition over time – No effective mechanisms to curate, register, or search such definitions – No automated refresh cycle for data in CDDB 4
  • 5. Why is this a Problem? • Lack of a common platform inhibits collaboration between groups since they have to rely on external sources to know what internal research has been done on a cell line • When there is collaboration, e.g., with one group supplying cell lines and data to another group, may have issues with updating metadata, e.g., primary site change • Lack of a common vocabulary leads to data quality issues, e.g., what do you mean by Doubling Time • Velocity of scientific discovery is slower as a result 5
  • 6. Practical examples 6 • What metadata is tracked at what level? • Who decides the metadata categories and values? • How do we promote project-specific metadata to parental cell lines?
  • 7. Practical examples • Who decides two or more cell lines are the same thing? – Example: A375 and unknown cell line – Heuristic: They are the same cell line if they have the same genomic fingerprint and same source (e.g., individual and tissue type) – more measurements of sameness to be added later 7
  • 8. Desired Situation • Common cell line metadata categories and data • Defined, published, flexible processes for collaborative reviewapproval of metadata categories and data (e.g., intake, change, promotion) • Retain ability for groups to work independently on project- specific metadata and data • Technology that enables wide-spread sharing of cell-line metadata categories and data, inside and outside Broad 8
  • 10. Hypothesis: Manufacturing Practices & Appropriate Technology Can Help • Use best practices from manufacturing around master data management to build necessary organizational practices • Use technology to enable organization practices • Principles: – Technology without organizational practices is a waste – Organizational practices without enabling, sustainable use of technology will wither 10
  • 11. Cell Line Master Data Review Board • Establish a cell line master data review board to review metadata categories and data stewardshipmanagement practices – Draws from Material Review Boards in manufacturing – Provides a forum for a “coalition of the willing” to come to consensus about metadata categories before categoriesvalues are established and curate changes to before making them – Provides institutional sponsorship, above the level of individual projects, while being collaborative 11
  • 12. Cell Line Master Data Review Board Proposed SponsorshipMembership • Office of the Chief Data Officer sponsors the board to provide cross-project arbitration • Initial membership by organization: – Office of the Chief Data Officer – Office of the Chief Science Officer – Developer of the institutional database e.g., CSofT – Projects creating cell lines and metadata, e.g., PRISM, Achilles – Groups ingesting cell line metadata, e.g., Proteomics, CDoT – BITS as facilitator (works across organization, neutral about science) – Ad hoc members 12
  • 13. Cell Line Master Data Review Board Proposed PoliciesProcedures • Board mechanics: Governance, changes to membership, etc. • Develop canonical source of parental cell line definition – Assumes can use existing metadata categories and values • Initial methods 13 • Register new cell lines • Add new metadata categories • Add new metadata to existing categories • Change metadata categories or values for existing cell lines • Track provenance of names and annotations (differences left to end users to resolve)
  • 14. Framework for Sharing Cell Line Metadata • Use institutional database as the canonical source of cell line metadata • Provide means of ingesting institutional data into local data management systems to link project specific data to parental cell line data • In the local data management system, have a common registry of parental cell lines (available to all) and daughter cell lines (project specific by default) • Preserve heredity of cell lines and allow searching by such 14
  • 15. Institutional Cell Line Database Sample Entity Relationship Diagram 15 • Tracks multiple names and annotations (e.g., lineage) and the source of these claims • Has no concept of samples or instances (annotates the abstract entity only)
  • 16. Data exchange via Java Script Object Notation (JSON) file: cell_sample = { cell_sample_names: [ {cell_name_type: "CCLE", cell_sample_name: "A375_SKIN“}, {cell_name_type: "cddb", cell_sample_name: "30"}, {cell_name_type: “ATCC", cell_sample_name: "A-375 [A375] (ATCC® CRL-1619™)"}] } Institutional Cell Line Database Sample Data Exchange Mechanism 16 • cell_sample: Name space for a cell line name, e.g., CCLE, CDDB, ATCC • cell_name_type: Name for a cell line and internal priority of that name, e.g., may prefer one name to another name • cell_sample_name: array of names for a cell line, e.g., – CCLE: A375_SKIN – CDDB: 30 – ATCC: A-375 [A375] (ATCC® CRL-1619™)
  • 17. Local Data Management System: Laboratory Data Management (LDM) • Project for BITS to provide centrally-managedsupported solutions for management of laboratory data, divided into functions: – Data capturearchive (instruments and other sources) – Container inventoryregistration (chemical, biological, hybrid)sample management – Core Electronic Laboratory Notebook (ELN, experiment documentationIP protectionlinking to data) – Dataworkflow management – Data analysisvisualization 17
  • 18. Cell Line Metadata and Data in LDM Lucidchart - Diagrams Done Right
  • 19. Cell Line Metadata and Data in ArxLab 1919
  • 20. Next Steps for Sharing Cell Line Metadata • Work out data privacyclassification restrictions • Phased implementation for sharing data from institutional database cell line database with external systems like ArxLab – Phase 1: Import static list (e.g., JSON file) of parental cell lines (~117K) and synonyms into ArxLab Registration with type-ahead to auto complete names, e.g., A37 shows A375 – Phase 2: Add resolution of entered names to a common cell line ID and preferred name, e.g., entering A375_SKIN resolves to A375 upon entry – Phase 3: Automatic update LDM via periodic push from institutional cell line database, including setting up legal framework for data distribution 20
  • 21. Acknowledgements Achilles Francesca Vazquez Sasha Pantel Nicole Dabkowski Phil Montgomery Glenn Cowley PRISM Chris Mader Jen Roth Sam Bender Massami Laird Ed McBride 21 CDDB Data Curation Paul Clemons Mahmoud Ghandi Shuba Gopal Gregory Gydush Barbara Weir Broad Management Alex Burgin Anthony Philippakis Scott Sutherland Broad Information Technology Services Chris Dwan Eric Jones Arxspan Jeff Carter Kate Hardy
  • 23. Summary – Background • One of the key challenges in conducting research in a diverse and dynamic organization like the Broad Institute is connecting islands of related data • Since scientific groups have traditionally been separated from each other, relying on each other as internal suppliers and customers, their data have similarly been separated; it is not uncommon to have two groups working on the same cell line but have no means of finding out about each other's work, partially due to different means of tracking cell-line data • The Broad Institute has collaborated with Arxspan to develop a configuration of ArxLab to share a common registry of parental cell lines, allowing different groups to have a common vocabulary about cell lines and opening collaboration possibilities for both new science and accelerated progress on existing science 23
  • 24. What You Can Gain – Background • Gain insight into how the Broad solved a common and intransigent issue facing a variety of diverse organizations using cloud-based, current-generation laboratory data-management software in a manner that can be reapplied in a variety of situations • See how different departments within the Broad worked collaboratively with Arxspan to solve this issue in a horizontal manner, i.e., differently from either a bottom up or top down approach • Shows how existing technology can be extended in demanding scientific environments to solve long-standing collaboration issues within a leading biomedical research organization 24

Hinweis der Redaktion

  1. More about Bruce at LinkedIn: https://www.linkedin.com/in/bkozuma More about Paul at LinkedIn: https://www.linkedin.com/in/pclemons More about Paul at the Broad Institute: http://www.broadinstitute.org/scientific-community/science/programs/csoft/chemical-biology/paul-clemons
  2. More about the Broad Institute of MIT and Harvard: http://www.broadinstitute.org
  3. More about Achilles: http://www.broadinstitute.org/Achilles More about PRISM: https://www.broadinstitute.org/software/cprg/?q=node/67 More about CCLE: http://www.broadinstitute.org/ccle More about CSofT: http://www.broadinstitute.org/scientific-community/science/programs/csoft/center-science-therapeutics More about CMAP: http://www.broadinstitute.org/cmap More about CDoT:
  4. Sample here = synonym, not physical sample