BioData World Basel 2018

NGS: How what we are measuring impacts
data models and implications for data
commons
Anne Deslattes Mays, PhD
Principal Computational Scientist

How to handle the disruption of new measurement
technologies in our data ecosystem?

What does this mean for data science?

How do we become better data stewards?

11/29/2018 BioDataWorld Congress - Basel
This presentation was prepared by Anne Deslattes Mays, PhD in her personal capacity.
The opinions expressed in this presentation are the author's own and not necessarily
the views and opinions of the Jackson Laboratory
Disclaimer

Introduction to the Jackson Laboratory
What is Next Generation Sequencing Data used for today?
How do we handle disruptions new measurement technologies bring?
What is Proper Data Stewardship for Data Science?
What does this mean for Data Commons ?
How does we capture the context and precision of measurements?
Talk Overview
1
2
3
4
5
6

The Jackson Laboratory (https://www.jax.org/)
To discover precise genomic solutions for disease and empower the
global biomedical community in the shared quest to improve
human health.

The Jackson Laboratory (https://www.jax.org/)

Recent News

JAX® Mice, Clinical and Research Services
> 10,000 mice strains supporting biomedical research
> 80% research publications citing mice strains use JAX® Mice
> 30,000 peer-reviewed publications cite use of JAX® Mice
> 22,000 genetically diverse background strains cryopreserved
> 2,500 strains successfully cryorecovered by JAX each year
> 75 new models CRISPR created on different genetic backgrounds
Every month hundreds publications reference JAX® Mice strains
1
2
3
4
5
6
7

JAX Clinical Genomics Laboratory (CGL) Offerings:
SampleTypesValidated forTesting
FFPE: Formalin Fixed Paraffin
Embedded tissue (SOLID TUMORS)
Cell Free DNA
Whole Blood
Buccal Swab
Saliva
CancerInheritedDisorders
Honey Reddi, PhD, FACMG
Clinical Lab Director

Who DoWe Serve?
Clinicians
Pharma +
Academia
Biotech +
JAX PIs
- CLIA validated tests - CLIA validated tests
- Research Assays
- Research Assays
- Custom Assay
Development

Assays for Confirmation of variants
Types of variants
Confirmatory
Technology
Nucleic Acid
Research or
Orthogonal
technology
Variant(s)
Identified
DNA
ddPCR
($570/sample)
SNPs,
CNVs
Sanger
($400/sample)
SNPs,
InDels
RNA
RT-PCR
($342/sample)
Fusions
48-60 samples/run, TAT of ~6 days if primer/probes available in-house

Research Assays: PDX
A suite of assays for mutational and expression
analysis of PDX tissue, includes PDX filtering

Clinical Knowledge Base

Scientific Services at JAX
JAX-GM Cellular Engineering
Microbial Genomics
Single Cell Biology
Genome Technologies
Center for Biometric Analysis
PDX Research and Development
Microscopy Services
Flow Cytometry
Mass Spectrometry and Protein Chemistry
Monoclonal Antibody Services
1
2
3
4
5
6
7
8
9
10

Scientific Services at JAX
JAX-GM Cellular Engineering ✔️
Microbial Genomics
Single Cell Biology
Genome Technologies
Center for Biometric Analysis
PDX Research and Development
Microscopy Services
Flow Cytometry
Mass Spectrometry and Protein Chemistry
Monoclonal Antibody Services
1
2
3
4
5
6
7
8
9
10
✔️- Using NGSTechnologies

Gordon Bell Prize Super Computing 2018

Gordon Bell Prize Super Computing 2018
750,000 human genome types, associated with more
than a billion medical records over a 20-year period.

Oxford Nanopore Offerings

Workman, Rachael E., et al. "Nanopore native RNA sequencing of a human
poly (A) transcriptome." bioRxiv (2018): 459529.
Human poly (A) transcriptome

https://blog.genohub.com/2017/06/16/pacbio-vs-
oxford-nanopore-sequencing/
PacBio vs Oxford Nanopore Sequencing

PacBio Concensus Accuracy > 99%
raw PacBio reads also differ in error types (more indels than mismatches) and
have a much higher abundance (∼13–15%,Table 1), though they are spread
randomly across the reads (25,26).This randomness enables highly accurate
consensuses (>99%) to be build up rapidly by sequencing multiple times the
same molecule (CCS reads)
Simon Ardui,AdamAmeur, Joris RVermeesch, Matthew S Hestand; Single molecule real-time (SMRT) sequencing comes of
age: applications and utilities for medical diagnostics, NucleicAcids Research,Volume 46, Issue 5, 16 March 2018, Pages
2159–2168, https://doi.org/10.1093/nar/gky066

All measurements taken on biological samples are made within the context of
instrument limitations, procedures followed in preparing samples for
measurement and the condition and the context of the samples being
measured.
Raw result data, quality data, metadata and procedures used to transform
measurement data from the instrument and/or the experimental procedures
are best captured at the time of experimental design to aid in primary and
secondary processing.
Biological Samples Details Need Metadata
Library Construction Details Need Metadata
Instrument Details Need Metadata

How do we handle disruptions new measurement
technologies bring?
Long Reads Sequence unfragmented cDNA libraries
Short Reads are sequenced on fragmented cDNA libraries
Capturing the full length (5’ UTR to 3’ UTR) open reading frames at the
transcript level
Measuring theTranscriptome allows us to peer into the Proteome
Validation can occur with peptides
This Sample SpecificTranscriptome containsAlternatively SplicedTranscripts
Specific to the SampleCollected – altering the gene model for that sample
We need to capture the gene model in Data Commons
for future reuse

FAIR Data Action Plan (Preliminary Steps)
Interim recommendations and actions from the European Commission Expert
Group on FAIR data

FAIR Data Action Plan (Preliminary Steps)
Interim recommendations and actions from the European Commission Expert
Group on FAIR data
Define and apply FAIR appropriately
Develop and support a sustainable FAIR data ecosystem
Ensure FAIR data and certified services to support FAIR
1
2
3

FAIR Data Object (Core Bits)

BioDataWorld Congress - Basel11/29/2018
Genome
Technologies
Imaging
Services
Single Cell
Services
Grant
Award
Data
Analysis
Repeat
Google Cloud Platform
Docker
TCGA
JAX Pipelines API
Analysis
Program
URL RESULTS
ISB-CGC
/mnt/input
/mnt/output
- ISB-CGC
- JAX-pipelines
- Analysis Program
- Google Cloud
ATypical Researcher’s Path
Paper Writing &
Acceptance
TIER 1
TIER 3
TIER 2
SRA
GEO

BioDataWorld Congress - Basel11/29/2018
Genome
Technologies
Imaging
Services
Single Cell
Services
Grant
Award
Data
Analysis
Repeat
Google Cloud Platform
Docker
TCGA
JAX Pipelines API
Analysis
Program
URL RESULTS
ISB-CGC
/mnt/input
/mnt/output
- ISB-CGC
- JAX-pipelines
- Analysis Program
- Google Cloud
Where is the metadata and where is it captured?
Paper Writing &
Acceptance
TIER 1
TIER 3
TIER 2
SRA
GEO
BioProject:
What was the
question being
asked? Experimental
Design:
What tissue is being
measured?
How was the library
constructed?
At what time points
were the data
collected?
SRA:
BioSample:
Raw FASTQ files
stored - controlled
access data?
Matrices:
Junction Count by Sample
Instrument Details:
which version of the
instrument?
What chemistries
Sample Collection
Details – affects
quality – when and
where were the
samples collected
Library Construction
Details: fragmented or
unfragmented libraries?

NCI Cancer Research Data Commons
11/29/2018 Data Stewardship | 36

#datagovernancematters

Data management plans needed for data produced
We need metadata (data about our data) including instruments
We need to adhere to W3C standards, RDF, data catalogs, publish data
Ontologies should be used everywhere
More metadata need to be captured
Data need to be FAIR by man and machine
11/29/2018 Data Stewardship
Data Commons Data Management for Data Stewardship:
1
2
3
4
5
6
| 38

BioData World Basel 2018

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie BioData World Basel 2018

Ähnlich wie BioData World Basel 2018 (20)

Mehr von Anne Deslattes Mays

Mehr von Anne Deslattes Mays (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

BioData World Basel 2018

Hinweis der Redaktion