2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1

Enabling Cross-Group Collaboration on Cell Lines via Arxspan's ArxLab, with Paul Clemons

  1. 1. Cell Line Metadata in ArxLab • Create ArxLab Registration objects to parental cell lines with minimal common metadata • Create ArxLab Registration entries for project-specific daughter cell lines with additional project-specific metadata • Standardize where possible on ArxLab Assay definitions Why this is a Problem • Lack of a common practicesplatform inhibits collaboration between groups since they have to rely on external sources to know what internal research has been done on a cell line • When there is collaboration, e.g., with one group supplying cell lines and data to another group, may have issues with updating metadata, e.g., primary site change • Lack of a common vocabulary leads to data quality issues, e.g., what do you mean by Doubling Time • Velocity of scientific discovery is slower as a result Challenge • One of the key challenges in conducting research in a diverse and dynamic organization like the Broad Institute is connecting islands of related data. • Since scientific groups have traditionally been separated from each other, relying on each other as internal suppliers and customers, their data have similarly been separated; it is not uncommon to have two groups working on the same cell line but have no means of finding out about each other's work, partially due to different means of tracking cell-line data • The Broad Institute has collaborated with Arxspan to develop a configuration of ArxLab to share a common registry of parental cell lines, allowing different groups to have a common vocabulary about cell lines and opening collaboration possibilities for both new science and accelerated progress on existing science Solution Framework • Use institutional database as the canonical source of cell line metadata • Ingest institutional data into local data management systems to link project specific data to parental cell line data • Have a common registry of parental cell lines (available to all) and daughter cell lines (project specific by default) • Preserve heredity of cell lines and allow searching by such Example • What metadata tracked at what level? • Who decides the metadata categories and values? • How do we promote project-specific metadata to parental cell lines? Desired State • Common cell line metadata categories and data • Defined, published, flexible processes for collaborative reviewapproval of metadata categories and data (e.g., intake, change, promotion) • Retain ability for groups to work independently on project-specific metadata and data • Technology that enables wide-spread sharing of cell-line metadata categories and data, inside and outside Broad Hypothesis • Use best practices from manufacturing around master data management (e.g., Master Data Review board) to build necessary organizational practices • Use technology to enable organization processes • Principles: o Technology without organizational processes is a waste o Organizational processes without enabling, sustainable use of technology will wither Institutional Cell Line Database Sample Entity Relationship Diagram • Tracks multiple names and annotations (e.g., lineage) and the source of these claims • Has no concept of samples or instances (annotates the abstract entity only) • cell_sample: Name space for a cell line name, e.g., CCLE, CDDB, ATCC Enabling Cross-Group Collaboration on Cell Lines Data exchange via Java Script Object Notation (JSON) file: cell_sample = { cell_sample_names: [ {cell_name_type: "CCLE", cell_sample_name: "A375_SKIN"}, {cell_name_type: "cddb", cell_sample_name: "30"}, {cell_name_type: “ATCC", cell_sample_name: "ATCC: A-375 [A375] (ATCC® CRL-1619™)“} ] } • cell_name_type: Name for cell line and internal priority of that name, e.g., may prefer one name to another name • cell_sample_name: array of names for a cell line, e.g., o CCLE: A375_SKIN o CDDB: 30 o ATCC: A-375 [A375] (ATCC® CRL-1619™) Bruce Kozuma, PMP, CPIM Broad Institute bkozuma@broadinstitute.org Current State • Multiple groups creating and using cell lines at the Broad, e.g., Achilles, PRISM, Cancer Cell Line Encyclopedia (CCLE) • Some canonical sources of cell-line data at Broad, e.g., Cancer Cell Line Dependencies Database (CDDB) • However! o Limited coordination in definitions of what constitutes a unique cell line and how changes are made to that definition over time o No effective mechanisms to curate, register, or search such definitions o No automated refresh cycle for data in CDDB Credits CDD Data Curation Paul Clemons Mahmoud Ghandi Shuba Gopal Gregory Gydush Barbara Weir Achilles Francesca Vazquez Sasha Pantel Nicole Dabkowski Phil Montgomery Glenn Cowley PRISM Chris Mader Jen Roth Sam Bender Massami Laird Ed McBride Broad Management Alex Burgin Anthony Philippakis Scott Sutherland BITS Chris Dwan Eric Jones Arxspan Jeff Carter Kate Hardy