Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

2016 Bio-IT World Cell Line Coordination 2016-04-06v1

259 Aufrufe

Veröffentlicht am

Enabling Cross-Group Collaboration on Cell Lines via Arxspan’s ArxLab, with Paul Clemons

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

2016 Bio-IT World Cell Line Coordination 2016-04-06v1

  1. 1. Enabling Cross-Group Collaboration on Cell Lines via Arxspan's ArxLab 2016/04/06, v1
  2. 2. Authors • Bruce Kozuma is a projectprogram manager in the Broad Information Technology Services (BITS) department with experience in software development, operations, and IT in industries such as manufacturing, telecommunications, biotechnology, and biomedical research. • Paul Clemons is director of computational chemical biology research in the Center for the Science of Therapeutics (CSofT) at the Broad Institute. He and his team use quantitative measurement, computational, and visualization techniques to enable systematic use of small molecules to explore biology, especially disease biology. 2
  3. 3. About the Broad Institute • A collaborative community pioneering a new model of biomedical science; views itself as an experiment in a new way of doing science, empowering researchers to: – Act nimbly – Work boldly – Share openly – Reach globally 3
  4. 4. Current Cell Line Management State • Multiple groups creating and using cell lines at the Broad, e.g., – Project Achilles, Profiling Relative Inhibition Simultaneously in Mixtures (PRSIM), Cancer Cell Line Encyclopedia (CCLE), Center for the Science of Therapeutics (CSofT), Connectivity Map (CMAP), Center for the Development of Therapeutics (CDoT) • Some canonical sources of cell-line data at Broad, e.g., – Cancer Cell Line Dependencies Database (CDDB) • However! – Limited coordination in definitions of what constitutes a unique cell line and how changes are made to that definition over time – No effective mechanisms to curate, register, or search such definitions – No automated refresh cycle for data in CDDB 4
  5. 5. Why is this a Problem? • Lack of a common platform inhibits collaboration between groups since they have to rely on external sources to know what internal research has been done on a cell line • When there is collaboration, e.g., with one group supplying cell lines and data to another group, may have issues with updating metadata, e.g., primary site change • Lack of a common vocabulary leads to data quality issues, e.g., what do you mean by Doubling Time • Velocity of scientific discovery is slower as a result 5
  6. 6. Practical examples 6 • What metadata is tracked at what level? • Who decides the metadata categories and values? • How do we promote project-specific metadata to parental cell lines?
  7. 7. Practical examples • Who decides two or more cell lines are the same thing? – Example: A375 and unknown cell line – Heuristic: They are the same cell line if they have the same genomic fingerprint and same source (e.g., individual and tissue type) – more measurements of sameness to be added later 7
  8. 8. Desired Situation • Common cell line metadata categories and data • Defined, published, flexible processes for collaborative reviewapproval of metadata categories and data (e.g., intake, change, promotion) • Retain ability for groups to work independently on project- specific metadata and data • Technology that enables wide-spread sharing of cell-line metadata categories and data, inside and outside Broad 8
  9. 9. Cell Lines Metadata 9
  10. 10. Hypothesis: Manufacturing Practices & Appropriate Technology Can Help • Use best practices from manufacturing around master data management to build necessary organizational practices • Use technology to enable organization practices • Principles: – Technology without organizational practices is a waste – Organizational practices without enabling, sustainable use of technology will wither 10
  11. 11. Cell Line Master Data Review Board • Establish a cell line master data review board to review metadata categories and data stewardshipmanagement practices – Draws from Material Review Boards in manufacturing – Provides a forum for a “coalition of the willing” to come to consensus about metadata categories before categoriesvalues are established and curate changes to before making them – Provides institutional sponsorship, above the level of individual projects, while being collaborative 11
  12. 12. Cell Line Master Data Review Board Proposed SponsorshipMembership • Office of the Chief Data Officer sponsors the board to provide cross-project arbitration • Initial membership by organization: – Office of the Chief Data Officer – Office of the Chief Science Officer – Developer of the institutional database e.g., CSofT – Projects creating cell lines and metadata, e.g., PRISM, Achilles – Groups ingesting cell line metadata, e.g., Proteomics, CDoT – BITS as facilitator (works across organization, neutral about science) – Ad hoc members 12
  13. 13. Cell Line Master Data Review Board Proposed PoliciesProcedures • Board mechanics: Governance, changes to membership, etc. • Develop canonical source of parental cell line definition – Assumes can use existing metadata categories and values • Initial methods 13 • Register new cell lines • Add new metadata categories • Add new metadata to existing categories • Change metadata categories or values for existing cell lines • Track provenance of names and annotations (differences left to end users to resolve)
  14. 14. Framework for Sharing Cell Line Metadata • Use institutional database as the canonical source of cell line metadata • Provide means of ingesting institutional data into local data management systems to link project specific data to parental cell line data • In the local data management system, have a common registry of parental cell lines (available to all) and daughter cell lines (project specific by default) • Preserve heredity of cell lines and allow searching by such 14
  15. 15. Institutional Cell Line Database Sample Entity Relationship Diagram 15 • Tracks multiple names and annotations (e.g., lineage) and the source of these claims • Has no concept of samples or instances (annotates the abstract entity only)
  16. 16. Data exchange via Java Script Object Notation (JSON) file: cell_sample = { cell_sample_names: [ {cell_name_type: "CCLE", cell_sample_name: "A375_SKIN“}, {cell_name_type: "cddb", cell_sample_name: "30"}, {cell_name_type: “ATCC", cell_sample_name: "A-375 [A375] (ATCC® CRL-1619™)"}] } Institutional Cell Line Database Sample Data Exchange Mechanism 16 • cell_sample: Name space for a cell line name, e.g., CCLE, CDDB, ATCC • cell_name_type: Name for a cell line and internal priority of that name, e.g., may prefer one name to another name • cell_sample_name: array of names for a cell line, e.g., – CCLE: A375_SKIN – CDDB: 30 – ATCC: A-375 [A375] (ATCC® CRL-1619™)
  17. 17. Local Data Management System: Laboratory Data Management (LDM) • Project for BITS to provide centrally-managedsupported solutions for management of laboratory data, divided into functions: – Data capturearchive (instruments and other sources) – Container inventoryregistration (chemical, biological, hybrid)sample management – Core Electronic Laboratory Notebook (ELN, experiment documentationIP protectionlinking to data) – Dataworkflow management – Data analysisvisualization 17
  18. 18. Cell Line Metadata and Data in LDM Lucidchart - Diagrams Done Right
  19. 19. Cell Line Metadata and Data in ArxLab 1919
  20. 20. Next Steps for Sharing Cell Line Metadata • Work out data privacyclassification restrictions • Phased implementation for sharing data from institutional database cell line database with external systems like ArxLab – Phase 1: Import static list (e.g., JSON file) of parental cell lines (~117K) and synonyms into ArxLab Registration with type-ahead to auto complete names, e.g., A37 shows A375 – Phase 2: Add resolution of entered names to a common cell line ID and preferred name, e.g., entering A375_SKIN resolves to A375 upon entry – Phase 3: Automatic update LDM via periodic push from institutional cell line database, including setting up legal framework for data distribution 20
  21. 21. Acknowledgements Achilles Francesca Vazquez Sasha Pantel Nicole Dabkowski Phil Montgomery Glenn Cowley PRISM Chris Mader Jen Roth Sam Bender Massami Laird Ed McBride 21 CDDB Data Curation Paul Clemons Mahmoud Ghandi Shuba Gopal Gregory Gydush Barbara Weir Broad Management Alex Burgin Anthony Philippakis Scott Sutherland Broad Information Technology Services Chris Dwan Eric Jones Arxspan Jeff Carter Kate Hardy
  22. 22. Background slides 22
  23. 23. Summary – Background • One of the key challenges in conducting research in a diverse and dynamic organization like the Broad Institute is connecting islands of related data • Since scientific groups have traditionally been separated from each other, relying on each other as internal suppliers and customers, their data have similarly been separated; it is not uncommon to have two groups working on the same cell line but have no means of finding out about each other's work, partially due to different means of tracking cell-line data • The Broad Institute has collaborated with Arxspan to develop a configuration of ArxLab to share a common registry of parental cell lines, allowing different groups to have a common vocabulary about cell lines and opening collaboration possibilities for both new science and accelerated progress on existing science 23
  24. 24. What You Can Gain – Background • Gain insight into how the Broad solved a common and intransigent issue facing a variety of diverse organizations using cloud-based, current-generation laboratory data-management software in a manner that can be reapplied in a variety of situations • See how different departments within the Broad worked collaboratively with Arxspan to solve this issue in a horizontal manner, i.e., differently from either a bottom up or top down approach • Shows how existing technology can be extended in demanding scientific environments to solve long-standing collaboration issues within a leading biomedical research organization 24