Boost PC performance: How more available memory can improve productivity
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
1. Cell Line Metadata in ArxLab
• Create ArxLab Registration objects to
parental cell lines with minimal common
metadata
• Create ArxLab Registration entries for
project-specific daughter cell lines with
additional project-specific metadata
• Standardize where possible on ArxLab
Assay definitions
Why this is a Problem
• Lack of a common practicesplatform
inhibits collaboration between
groups since they have to rely on
external sources to know what internal
research has been done on a cell line
• When there is collaboration, e.g., with
one group supplying cell lines and data
to another group, may have issues with
updating metadata, e.g., primary site
change
• Lack of a common vocabulary leads to
data quality issues, e.g., what do you
mean by Doubling Time
• Velocity of scientific discovery is
slower as a result
Challenge • One of the key challenges in conducting research in a diverse and dynamic organization like the Broad
Institute is connecting islands of related data.
• Since scientific groups have traditionally been separated from each other, relying on each other as internal suppliers and
customers, their data have similarly been separated; it is not uncommon to have two groups working on the same cell line but
have no means of finding out about each other's work, partially due to different means of tracking cell-line data
• The Broad Institute has collaborated with Arxspan to develop a configuration of ArxLab to share a common registry of parental
cell lines, allowing different groups to have a common vocabulary about cell lines and opening collaboration possibilities for both
new science and accelerated progress on existing science
Solution Framework
• Use institutional
database as the
canonical source of
cell line metadata
• Ingest institutional data into local
data management
systems to link
project specific data
to parental cell line data
• Have a common registry of parental
cell lines (available to all) and
daughter cell lines (project specific
by default)
• Preserve heredity of cell lines and
allow searching by such
Example
• What metadata tracked at what level?
• Who decides the metadata categories
and values?
• How do we promote project-specific
metadata to parental cell lines?
Desired State
• Common cell line metadata categories
and data
• Defined, published, flexible processes
for collaborative reviewapproval of
metadata categories and data (e.g.,
intake, change, promotion)
• Retain ability for groups to work
independently on project-specific
metadata and data
• Technology that enables wide-spread
sharing of cell-line metadata categories
and data, inside and outside Broad
Hypothesis
• Use best practices from
manufacturing around
master data management
(e.g., Master Data Review board) to
build necessary organizational
practices
• Use technology to enable
organization processes
• Principles:
o Technology without
organizational
processes is a waste
o Organizational processes without
enabling, sustainable use of
technology will wither
Institutional Cell Line Database
Sample Entity Relationship Diagram
• Tracks multiple names and
annotations (e.g., lineage) and
the source of these claims
• Has no concept of samples or
instances (annotates the
abstract entity only)
• cell_sample: Name space
for a cell line name, e.g., CCLE,
CDDB, ATCC
Enabling Cross-Group
Collaboration on Cell Lines
Data exchange via Java Script Object Notation (JSON) file:
cell_sample = { cell_sample_names: [
{cell_name_type: "CCLE", cell_sample_name: "A375_SKIN"},
{cell_name_type: "cddb", cell_sample_name: "30"},
{cell_name_type: “ATCC", cell_sample_name: "ATCC: A-375 [A375] (ATCC® CRL-1619™)“} ] }
• cell_name_type: Name for
cell line and internal priority of
that name, e.g., may prefer one
name to another name
• cell_sample_name: array of
names for a cell line, e.g.,
o CCLE: A375_SKIN
o CDDB: 30
o ATCC: A-375 [A375] (ATCC®
CRL-1619™)
Bruce Kozuma, PMP, CPIM
Broad Institute
bkozuma@broadinstitute.org
Current State
• Multiple groups creating and using cell
lines at the Broad, e.g., Achilles, PRISM,
Cancer Cell Line Encyclopedia (CCLE)
• Some canonical sources of cell-line
data at Broad, e.g., Cancer Cell Line
Dependencies Database (CDDB)
• However!
o Limited coordination in definitions
of what constitutes a unique cell line
and how changes are made to that
definition over time
o No effective mechanisms to curate,
register, or search such definitions
o No automated refresh cycle for data
in CDDB
Credits
CDD Data Curation
Paul Clemons
Mahmoud Ghandi
Shuba Gopal
Gregory Gydush
Barbara Weir
Achilles
Francesca Vazquez
Sasha Pantel
Nicole Dabkowski
Phil Montgomery
Glenn Cowley
PRISM
Chris Mader
Jen Roth
Sam Bender
Massami Laird
Ed McBride
Broad Management
Alex Burgin
Anthony Philippakis
Scott Sutherland
BITS
Chris Dwan
Eric Jones
Arxspan
Jeff Carter
Kate Hardy