SlideShare ist ein Scribd-Unternehmen logo
1 von 39
NGS: How what we are measuring impacts
data models and implications for data
commons
Anne Deslattes Mays, PhD
Principal Computational Scientist
How to handle the disruption of new measurement
technologies in our data ecosystem?
What does this mean for data science?
How do we become better data stewards?
11/29/2018 BioDataWorld Congress - Basel
This presentation was prepared by Anne Deslattes Mays, PhD in her personal capacity.
The opinions expressed in this presentation are the author's own and not necessarily
the views and opinions of the Jackson Laboratory
Disclaimer
Introduction to the Jackson Laboratory
What is Next Generation Sequencing Data used for today?
How do we handle disruptions new measurement technologies bring?
What is Proper Data Stewardship for Data Science?
What does this mean for Data Commons ?
How does we capture the context and precision of measurements?
11/29/2018 BioDataWorld Congress - Basel
Talk Overview
1
2
3
4
5
6
11/29/2018 BioDataWorld Congress - Basel
The Jackson Laboratory (https://www.jax.org/)
To discover precise genomic solutions for disease and empower the
global biomedical community in the shared quest to improve
human health.
11/29/2018 BioDataWorld Congress - Basel
The Jackson Laboratory (https://www.jax.org/)
11/29/2018 BioDataWorld Congress - Basel
Recent News
JAX® Mice, Clinical and Research Services
11/29/2018 BioDataWorld Congress - Basel
> 10,000 mice strains supporting biomedical research
> 80% research publications citing mice strains use JAX® Mice
> 30,000 peer-reviewed publications cite use of JAX® Mice
> 22,000 genetically diverse background strains cryopreserved
> 2,500 strains successfully cryorecovered by JAX each year
> 75 new models CRISPR created on different genetic backgrounds
Every month hundreds publications reference JAX® Mice strains
1
2
3
4
5
6
7
JAX Clinical Genomics Laboratory (CGL) Offerings:
SampleTypesValidated forTesting
FFPE: Formalin Fixed Paraffin
Embedded tissue (SOLID TUMORS)
Cell Free DNA
Whole Blood
Buccal Swab
Saliva
CancerInheritedDisorders
Honey Reddi, PhD, FACMG
Clinical Lab Director
Who DoWe Serve?
Clinicians
Pharma +
Academia
Biotech +
JAX PIs
- CLIA validated tests - CLIA validated tests
- Research Assays
- Research Assays
- Custom Assay
Development
Assays for Confirmation of variants
Types of variants
Confirmatory
Technology
Nucleic Acid
Research or
Orthogonal
technology
Variant(s)
Identified
DNA
ddPCR
($570/sample)
SNPs,
CNVs
Sanger
($400/sample)
SNPs,
InDels
RNA
RT-PCR
($342/sample)
Fusions
48-60 samples/run, TAT of ~6 days if primer/probes available in-house
Research Assays: PDX
A suite of assays for mutational and expression
analysis of PDX tissue, includes PDX filtering
Clinical Knowledge Base
11/29/2018 BioDataWorld Congress - Basel
Scientific Services at JAX
11/29/2018 BioDataWorld Congress - Basel
JAX-GM Cellular Engineering
Microbial Genomics
Single Cell Biology
Genome Technologies
Center for Biometric Analysis
PDX Research and Development
Microscopy Services
Flow Cytometry
Mass Spectrometry and Protein Chemistry
Monoclonal Antibody Services
1
2
3
4
5
6
7
8
9
10
Scientific Services at JAX
11/29/2018 BioDataWorld Congress - Basel
JAX-GM Cellular Engineering ✔️
Microbial Genomics
Single Cell Biology
Genome Technologies
Center for Biometric Analysis
PDX Research and Development
Microscopy Services
Flow Cytometry
Mass Spectrometry and Protein Chemistry
Monoclonal Antibody Services
1
2
3
4
5
6
7
8
9
10
✔️- Using NGSTechnologies
Gordon Bell Prize Super Computing 2018
11/29/2018 BioDataWorld Congress - Basel
Gordon Bell Prize Super Computing 2018
11/29/2018 BioDataWorld Congress - Basel
750,000 human genome types, associated with more
than a billion medical records over a 20-year period.
11/29/2018 BioDataWorld Congress - Basel
11/29/2018 BioDataWorld Congress - Basel
11/29/2018 BioDataWorld Congress - Basel
11/29/2018 BioDataWorld Congress - Basel
11/29/2018 BioDataWorld Congress - Basel
Oxford Nanopore Offerings
11/29/2018 BioDataWorld Congress - Basel
Workman, Rachael E., et al. "Nanopore native RNA sequencing of a human
poly (A) transcriptome." bioRxiv (2018): 459529.
Human poly (A) transcriptome
11/29/2018 BioDataWorld Congress - Basel
Workman, Rachael E., et al. "Nanopore native RNA sequencing of a human
poly (A) transcriptome." bioRxiv (2018): 459529.
Human poly (A) transcriptome
11/29/2018 BioDataWorld Congress - Basel
https://blog.genohub.com/2017/06/16/pacbio-vs-
oxford-nanopore-sequencing/
PacBio vs Oxford Nanopore Sequencing
11/29/2018 BioDataWorld Congress - Basel
PacBio Concensus Accuracy > 99%
raw PacBio reads also differ in error types (more indels than mismatches) and
have a much higher abundance (∼13–15%,Table 1), though they are spread
randomly across the reads (25,26).This randomness enables highly accurate
consensuses (>99%) to be build up rapidly by sequencing multiple times the
same molecule (CCS reads)
Simon Ardui,AdamAmeur, Joris RVermeesch, Matthew S Hestand; Single molecule real-time (SMRT) sequencing comes of
age: applications and utilities for medical diagnostics, NucleicAcids Research,Volume 46, Issue 5, 16 March 2018, Pages
2159–2168, https://doi.org/10.1093/nar/gky066
11/29/2018 BioDataWorld Congress - Basel
All measurements taken on biological samples are made within the context of
instrument limitations, procedures followed in preparing samples for
measurement and the condition and the context of the samples being
measured.
Raw result data, quality data, metadata and procedures used to transform
measurement data from the instrument and/or the experimental procedures
are best captured at the time of experimental design to aid in primary and
secondary processing.
Biological Samples Details Need Metadata
Library Construction Details Need Metadata
Instrument Details Need Metadata
11/29/2018 BioDataWorld Congress - Basel
How do we handle disruptions new measurement
technologies bring?
Long Reads Sequence unfragmented cDNA libraries
Short Reads are sequenced on fragmented cDNA libraries
Capturing the full length (5’ UTR to 3’ UTR) open reading frames at the
transcript level
Measuring theTranscriptome allows us to peer into the Proteome
Validation can occur with peptides
This Sample SpecificTranscriptome containsAlternatively SplicedTranscripts
Specific to the SampleCollected – altering the gene model for that sample
We need to capture the gene model in Data Commons
for future reuse
FAIR Data Action Plan (Preliminary Steps)
Interim recommendations and actions from the European Commission Expert
Group on FAIR data
11/29/2018 BioDataWorld Congress - Basel
FAIR Data Action Plan (Preliminary Steps)
Interim recommendations and actions from the European Commission Expert
Group on FAIR data
11/29/2018 BioDataWorld Congress - Basel
Define and apply FAIR appropriately
Develop and support a sustainable FAIR data ecosystem
Ensure FAIR data and certified services to support FAIR
1
2
3
11/29/2018 BioDataWorld Congress - Basel
FAIR Data Object (Core Bits)
BioDataWorld Congress - Basel11/29/2018
Genome
Technologies
Imaging
Services
Single Cell
Services
Grant
Award
Data
Analysis
Repeat
Google Cloud Platform
Docker
TCGA
JAX Pipelines API
Analysis
Program
URL RESULTS
ISB-CGC
/mnt/input
/mnt/output
- ISB-CGC
- JAX-pipelines
- Analysis Program
- Google Cloud
ATypical Researcher’s Path
Paper Writing &
Acceptance
TIER 1
TIER 3
TIER 2
SRA
GEO
BioDataWorld Congress - Basel11/29/2018
Genome
Technologies
Imaging
Services
Single Cell
Services
Grant
Award
Data
Analysis
Repeat
Google Cloud Platform
Docker
TCGA
JAX Pipelines API
Analysis
Program
URL RESULTS
ISB-CGC
/mnt/input
/mnt/output
- ISB-CGC
- JAX-pipelines
- Analysis Program
- Google Cloud
Where is the metadata and where is it captured?
Paper Writing &
Acceptance
TIER 1
TIER 3
TIER 2
SRA
GEO
BioProject:
What was the
question being
asked? Experimental
Design:
What tissue is being
measured?
How was the library
constructed?
At what time points
were the data
collected?
SRA:
BioSample:
Raw FASTQ files
stored - controlled
access data?
Matrices:
Junction Count by Sample
Instrument Details:
which version of the
instrument?
What chemistries
Sample Collection
Details – affects
quality – when and
where were the
samples collected
Library Construction
Details: fragmented or
unfragmented libraries?
NCI Cancer Research Data Commons
11/29/2018 Data Stewardship | 36
11/29/2018 BioDataWorld Congress - Basel
#datagovernancematters
Data management plans needed for data produced
We need metadata (data about our data) including instruments
We need to adhere to W3C standards, RDF, data catalogs, publish data
Ontologies should be used everywhere
More metadata need to be captured
Data need to be FAIR by man and machine
11/29/2018 Data Stewardship
Data Commons Data Management for Data Stewardship:
1
2
3
4
5
6
| 38
THANK YOU!

Weitere ähnliche Inhalte

Was ist angesagt?

Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Dana Caulder
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-upopen_phacts
 
Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.Flávio Codeço Coelho
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 Diane McKenna
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Next Generation Sequencing in Big Data
Next Generation Sequencing in Big DataNext Generation Sequencing in Big Data
Next Generation Sequencing in Big Dataijtsrd
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samplesGenomeInABottle
 
2012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 22012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 2Brett Whitty
 
Validating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniquesValidating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniquesEagle Genomics
 
Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Valery Tkachenko
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016ExternalEvents
 
Wim de Grave: Big Data in life sciences
Wim de Grave:  Big Data in life sciencesWim de Grave:  Big Data in life sciences
Wim de Grave: Big Data in life sciencesFlávio Codeço Coelho
 
Extracting clinical value from next gen sequencing
Extracting clinical value from next gen sequencingExtracting clinical value from next gen sequencing
Extracting clinical value from next gen sequencingWinton Gibbons
 
Cassava genome hub
Cassava genome hubCassava genome hub
Cassava genome hubCIAT
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigData_Europe
 
Building an Internet of Genomics
Building an Internet of GenomicsBuilding an Internet of Genomics
Building an Internet of GenomicsMarc Fiume
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomicsmikaelhuss
 
Trans disciplinary research is a must for excellence in science by Prof. Moha...
Trans disciplinary research is a must for excellence in science by Prof. Moha...Trans disciplinary research is a must for excellence in science by Prof. Moha...
Trans disciplinary research is a must for excellence in science by Prof. Moha...Prof. Mohamed Labib Salem
 

Was ist angesagt? (20)

Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
 
Ashg presentation 2010
Ashg presentation 2010Ashg presentation 2010
Ashg presentation 2010
 
Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Next Generation Sequencing in Big Data
Next Generation Sequencing in Big DataNext Generation Sequencing in Big Data
Next Generation Sequencing in Big Data
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
 
2012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 22012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 2
 
Validating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniquesValidating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniques
 
Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016
 
Wim de Grave: Big Data in life sciences
Wim de Grave:  Big Data in life sciencesWim de Grave:  Big Data in life sciences
Wim de Grave: Big Data in life sciences
 
Extracting clinical value from next gen sequencing
Extracting clinical value from next gen sequencingExtracting clinical value from next gen sequencing
Extracting clinical value from next gen sequencing
 
Cassava genome hub
Cassava genome hubCassava genome hub
Cassava genome hub
 
Data sharing and analysis
Data sharing and analysisData sharing and analysis
Data sharing and analysis
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & Health
 
Building an Internet of Genomics
Building an Internet of GenomicsBuilding an Internet of Genomics
Building an Internet of Genomics
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
 
Trans disciplinary research is a must for excellence in science by Prof. Moha...
Trans disciplinary research is a must for excellence in science by Prof. Moha...Trans disciplinary research is a must for excellence in science by Prof. Moha...
Trans disciplinary research is a must for excellence in science by Prof. Moha...
 

Ähnlich wie BioData World Basel 2018

Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesMartin Hartmann
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global communityExternalEvents
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Sage Base
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009Ian Foster
 
Optimizing the Output of Your Molecular Pathology Laboratory
Optimizing the Output of Your Molecular Pathology LaboratoryOptimizing the Output of Your Molecular Pathology Laboratory
Optimizing the Output of Your Molecular Pathology LaboratoryJosh Forsythe
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
biomedical research in an increasingly digital world
biomedical research in an increasingly digital worldbiomedical research in an increasingly digital world
biomedical research in an increasingly digital worldBrian Bot
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Challenges in Clinical Trials Networks
Challenges in Clinical Trials NetworksChallenges in Clinical Trials Networks
Challenges in Clinical Trials NetworksUS Cochrane Center
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformLaura Clarke
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management inscit2006
 
2011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 20112011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 2011MIBBI Checklists
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 

Ähnlich wie BioData World Basel 2018 (20)

Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009
 
Optimizing the Output of Your Molecular Pathology Laboratory
Optimizing the Output of Your Molecular Pathology LaboratoryOptimizing the Output of Your Molecular Pathology Laboratory
Optimizing the Output of Your Molecular Pathology Laboratory
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
biomedical research in an increasingly digital world
biomedical research in an increasingly digital worldbiomedical research in an increasingly digital world
biomedical research in an increasingly digital world
 
NRNB EAC Meeting 2012
NRNB EAC Meeting 2012NRNB EAC Meeting 2012
NRNB EAC Meeting 2012
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Challenges in Clinical Trials Networks
Challenges in Clinical Trials NetworksChallenges in Clinical Trials Networks
Challenges in Clinical Trials Networks
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
 
2011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 20112011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 2011
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 

Mehr von Anne Deslattes Mays

Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Anne Deslattes Mays
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...Anne Deslattes Mays
 
FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...Anne Deslattes Mays
 
RNA Sequencing for Full Length Transcript Discovery
RNA Sequencing for Full Length Transcript DiscoveryRNA Sequencing for Full Length Transcript Discovery
RNA Sequencing for Full Length Transcript DiscoveryAnne Deslattes Mays
 
2012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v22012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v2Anne Deslattes Mays
 
2013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v22013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v2Anne Deslattes Mays
 

Mehr von Anne Deslattes Mays (7)

Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
 
FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...
 
RNA Sequencing for Full Length Transcript Discovery
RNA Sequencing for Full Length Transcript DiscoveryRNA Sequencing for Full Length Transcript Discovery
RNA Sequencing for Full Length Transcript Discovery
 
2013 oct 2 rna sequencing
2013 oct 2 rna sequencing2013 oct 2 rna sequencing
2013 oct 2 rna sequencing
 
2012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v22012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v2
 
2013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v22013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v2
 

Kürzlich hochgeladen

Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCamilleBoulbin1
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Delhi Call girls
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Pooja Nehwal
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 

Kürzlich hochgeladen (20)

Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 

BioData World Basel 2018

  • 1. NGS: How what we are measuring impacts data models and implications for data commons Anne Deslattes Mays, PhD Principal Computational Scientist
  • 2. How to handle the disruption of new measurement technologies in our data ecosystem?
  • 3. What does this mean for data science?
  • 4. How do we become better data stewards?
  • 5. 11/29/2018 BioDataWorld Congress - Basel This presentation was prepared by Anne Deslattes Mays, PhD in her personal capacity. The opinions expressed in this presentation are the author's own and not necessarily the views and opinions of the Jackson Laboratory Disclaimer
  • 6. Introduction to the Jackson Laboratory What is Next Generation Sequencing Data used for today? How do we handle disruptions new measurement technologies bring? What is Proper Data Stewardship for Data Science? What does this mean for Data Commons ? How does we capture the context and precision of measurements? 11/29/2018 BioDataWorld Congress - Basel Talk Overview 1 2 3 4 5 6
  • 7. 11/29/2018 BioDataWorld Congress - Basel The Jackson Laboratory (https://www.jax.org/) To discover precise genomic solutions for disease and empower the global biomedical community in the shared quest to improve human health.
  • 8. 11/29/2018 BioDataWorld Congress - Basel The Jackson Laboratory (https://www.jax.org/)
  • 9. 11/29/2018 BioDataWorld Congress - Basel Recent News
  • 10. JAX® Mice, Clinical and Research Services 11/29/2018 BioDataWorld Congress - Basel > 10,000 mice strains supporting biomedical research > 80% research publications citing mice strains use JAX® Mice > 30,000 peer-reviewed publications cite use of JAX® Mice > 22,000 genetically diverse background strains cryopreserved > 2,500 strains successfully cryorecovered by JAX each year > 75 new models CRISPR created on different genetic backgrounds Every month hundreds publications reference JAX® Mice strains 1 2 3 4 5 6 7
  • 11. JAX Clinical Genomics Laboratory (CGL) Offerings: SampleTypesValidated forTesting FFPE: Formalin Fixed Paraffin Embedded tissue (SOLID TUMORS) Cell Free DNA Whole Blood Buccal Swab Saliva CancerInheritedDisorders Honey Reddi, PhD, FACMG Clinical Lab Director
  • 12. Who DoWe Serve? Clinicians Pharma + Academia Biotech + JAX PIs - CLIA validated tests - CLIA validated tests - Research Assays - Research Assays - Custom Assay Development
  • 13. Assays for Confirmation of variants Types of variants Confirmatory Technology Nucleic Acid Research or Orthogonal technology Variant(s) Identified DNA ddPCR ($570/sample) SNPs, CNVs Sanger ($400/sample) SNPs, InDels RNA RT-PCR ($342/sample) Fusions 48-60 samples/run, TAT of ~6 days if primer/probes available in-house
  • 14. Research Assays: PDX A suite of assays for mutational and expression analysis of PDX tissue, includes PDX filtering
  • 15. Clinical Knowledge Base 11/29/2018 BioDataWorld Congress - Basel
  • 16. Scientific Services at JAX 11/29/2018 BioDataWorld Congress - Basel JAX-GM Cellular Engineering Microbial Genomics Single Cell Biology Genome Technologies Center for Biometric Analysis PDX Research and Development Microscopy Services Flow Cytometry Mass Spectrometry and Protein Chemistry Monoclonal Antibody Services 1 2 3 4 5 6 7 8 9 10
  • 17. Scientific Services at JAX 11/29/2018 BioDataWorld Congress - Basel JAX-GM Cellular Engineering ✔️ Microbial Genomics Single Cell Biology Genome Technologies Center for Biometric Analysis PDX Research and Development Microscopy Services Flow Cytometry Mass Spectrometry and Protein Chemistry Monoclonal Antibody Services 1 2 3 4 5 6 7 8 9 10 ✔️- Using NGSTechnologies
  • 18. Gordon Bell Prize Super Computing 2018 11/29/2018 BioDataWorld Congress - Basel
  • 19. Gordon Bell Prize Super Computing 2018 11/29/2018 BioDataWorld Congress - Basel 750,000 human genome types, associated with more than a billion medical records over a 20-year period.
  • 24. 11/29/2018 BioDataWorld Congress - Basel Oxford Nanopore Offerings
  • 25. 11/29/2018 BioDataWorld Congress - Basel Workman, Rachael E., et al. "Nanopore native RNA sequencing of a human poly (A) transcriptome." bioRxiv (2018): 459529. Human poly (A) transcriptome
  • 26. 11/29/2018 BioDataWorld Congress - Basel Workman, Rachael E., et al. "Nanopore native RNA sequencing of a human poly (A) transcriptome." bioRxiv (2018): 459529. Human poly (A) transcriptome
  • 27. 11/29/2018 BioDataWorld Congress - Basel https://blog.genohub.com/2017/06/16/pacbio-vs- oxford-nanopore-sequencing/ PacBio vs Oxford Nanopore Sequencing
  • 28. 11/29/2018 BioDataWorld Congress - Basel PacBio Concensus Accuracy > 99% raw PacBio reads also differ in error types (more indels than mismatches) and have a much higher abundance (∼13–15%,Table 1), though they are spread randomly across the reads (25,26).This randomness enables highly accurate consensuses (>99%) to be build up rapidly by sequencing multiple times the same molecule (CCS reads) Simon Ardui,AdamAmeur, Joris RVermeesch, Matthew S Hestand; Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics, NucleicAcids Research,Volume 46, Issue 5, 16 March 2018, Pages 2159–2168, https://doi.org/10.1093/nar/gky066
  • 29. 11/29/2018 BioDataWorld Congress - Basel All measurements taken on biological samples are made within the context of instrument limitations, procedures followed in preparing samples for measurement and the condition and the context of the samples being measured. Raw result data, quality data, metadata and procedures used to transform measurement data from the instrument and/or the experimental procedures are best captured at the time of experimental design to aid in primary and secondary processing. Biological Samples Details Need Metadata Library Construction Details Need Metadata Instrument Details Need Metadata
  • 30. 11/29/2018 BioDataWorld Congress - Basel How do we handle disruptions new measurement technologies bring? Long Reads Sequence unfragmented cDNA libraries Short Reads are sequenced on fragmented cDNA libraries Capturing the full length (5’ UTR to 3’ UTR) open reading frames at the transcript level Measuring theTranscriptome allows us to peer into the Proteome Validation can occur with peptides This Sample SpecificTranscriptome containsAlternatively SplicedTranscripts Specific to the SampleCollected – altering the gene model for that sample We need to capture the gene model in Data Commons for future reuse
  • 31. FAIR Data Action Plan (Preliminary Steps) Interim recommendations and actions from the European Commission Expert Group on FAIR data 11/29/2018 BioDataWorld Congress - Basel
  • 32. FAIR Data Action Plan (Preliminary Steps) Interim recommendations and actions from the European Commission Expert Group on FAIR data 11/29/2018 BioDataWorld Congress - Basel Define and apply FAIR appropriately Develop and support a sustainable FAIR data ecosystem Ensure FAIR data and certified services to support FAIR 1 2 3
  • 33. 11/29/2018 BioDataWorld Congress - Basel FAIR Data Object (Core Bits)
  • 34. BioDataWorld Congress - Basel11/29/2018 Genome Technologies Imaging Services Single Cell Services Grant Award Data Analysis Repeat Google Cloud Platform Docker TCGA JAX Pipelines API Analysis Program URL RESULTS ISB-CGC /mnt/input /mnt/output - ISB-CGC - JAX-pipelines - Analysis Program - Google Cloud ATypical Researcher’s Path Paper Writing & Acceptance TIER 1 TIER 3 TIER 2 SRA GEO
  • 35. BioDataWorld Congress - Basel11/29/2018 Genome Technologies Imaging Services Single Cell Services Grant Award Data Analysis Repeat Google Cloud Platform Docker TCGA JAX Pipelines API Analysis Program URL RESULTS ISB-CGC /mnt/input /mnt/output - ISB-CGC - JAX-pipelines - Analysis Program - Google Cloud Where is the metadata and where is it captured? Paper Writing & Acceptance TIER 1 TIER 3 TIER 2 SRA GEO BioProject: What was the question being asked? Experimental Design: What tissue is being measured? How was the library constructed? At what time points were the data collected? SRA: BioSample: Raw FASTQ files stored - controlled access data? Matrices: Junction Count by Sample Instrument Details: which version of the instrument? What chemistries Sample Collection Details – affects quality – when and where were the samples collected Library Construction Details: fragmented or unfragmented libraries?
  • 36. NCI Cancer Research Data Commons 11/29/2018 Data Stewardship | 36
  • 37. 11/29/2018 BioDataWorld Congress - Basel #datagovernancematters
  • 38. Data management plans needed for data produced We need metadata (data about our data) including instruments We need to adhere to W3C standards, RDF, data catalogs, publish data Ontologies should be used everywhere More metadata need to be captured Data need to be FAIR by man and machine 11/29/2018 Data Stewardship Data Commons Data Management for Data Stewardship: 1 2 3 4 5 6 | 38

Hinweis der Redaktion

  1. To discover precise genomic solutions for disease and empower the global biomedical community in the shared quest to improve human health. Founded in 1929, The Jackson Laboratory (JAX) is an independent, nonprofit biomedical research institution with more than 2,200 employees who are passionate about our mission. The Laboratory is a world leader in mammalian genetics and human genomics and is developing scientific breakthroughs and improved therapies with ever-greater precision and speed. We also educate current and future scientists and provide critical resources, data, tools, and services to researchers worldwide. JAX has its mammalian genetics headquarters in Bar Harbor, Maine including a National Cancer Institute-designated Cancer Center; a genomic medicine facility in Farmington, Conn. enabling translation of fundamental research into the clinic; and facilities in Ellsworth, Maine and Sacramento, Calif.
  2. Although both PacBio and Oxford Nanopore generate longer reads compared to short read Illumina or Ion sequencing, the higher error rate of both the PacBio and Oxford Nanopore sequencers remain an issue needs addressing. Whereas PacBio reads a molecule multiple times to generate high-quality consensus data, Oxford Nanopore can only sequence a molecule twice. As a result, PacBio generates data with lower error rates compared to Oxford Nanopore. PacBio has a slightly better overall performance for applications such as the discovery of transcriptome complexity and sensitive identification of isoforms. On the other hand, MinION provides higher throughput as nanopores can sequence multiple molecules simultaneously. Hence, it is best suited for applications that require a larger amount of data9
  3. Although both PacBio and Oxford Nanopore generate longer reads compared to short read Illumina or Ion sequencing, the higher error rate of both the PacBio and Oxford Nanopore sequencers remain an issue needs addressing. Whereas PacBio reads a molecule multiple times to generate high-quality consensus data, Oxford Nanopore can only sequence a molecule twice. As a result, PacBio generates data with lower error rates compared to Oxford Nanopore. PacBio has a slightly better overall performance for applications such as the discovery of transcriptome complexity and sensitive identification of isoforms. On the other hand, MinION provides higher throughput as nanopores can sequence multiple molecules simultaneously. Hence, it is best suited for applications that require a larger amount of data9
  4. The European Union FAIR data action plan published June 2018 outlines the core bits of information that should be collected on data to make data meaningful. These include the need for persistent and unique identifiers, open and documented formats for the transformation of that data, using data object identifiers (DOIs) or unique resource identifiers, to enable stable links to objects and support for citations and reuse. Authors should be identified with unique identifiers (such as ORCIDs), projects (RAIDs), funders and associated research resources (RRIDs). The action plan goes on to state that open and documented formats for standards and code should employed and that while minimum metadata and documentation is necessary to accompany these core data bits, enabling basic data discovery, richer information and provenance is necessary to understand why, when and by whom the data were created and accompanied with an appropriate data usage license
  5. One researchers path Get a grant – order data generating services, PDX, Single Cell, Other Genomic Technology services, do some data analysis, you might do some analysis in the cloud, the data are archived, likely with metadata embedded in the filestructure of the data – usually, project, sample, sequencing data, upon paper acceptance or ahead of that, your sequencing data, along with appropriate metadata may need to be uploaded to the Sequence Read Archive (SRA) or to the Geomnibus (GEO) which then loads it up to the SRA and then this process is repeated. Repeating this process, different Researchers arrange their work in different ways, the data and the metadata may be embedded in the directory structure, or there maybe different MySQL databases around that contain each of the individual projects It could be that the data is arranged in a way that all a researchers project information is available to the researcher, sharing it with others is laborious and time consuming To be data driven, we need to access data across silos
  6. One researchers path Get a grant – order data generating services, PDX, Single Cell, Other Genomic Technology services, do some data analysis, you might do some analysis in the cloud, the data are archived, likely with metadata embedded in the filestructure of the data – usually, project, sample, sequencing data, upon paper acceptance or ahead of that, your sequencing data, along with appropriate metadata may need to be uploaded to the Sequence Read Archive (SRA) or to the Geomnibus (GEO) which then loads it up to the SRA and then this process is repeated. Repeating this process, different Researchers arrange their work in different ways, the data and the metadata may be embedded in the directory structure, or there maybe different MySQL databases around that contain each of the individual projects It could be that the data is arranged in a way that all a researchers project information is available to the researcher, sharing it with others is laborious and time consuming To be data driven, we need to access data across silos
  7. RDF Ontologies Data Catalogs Unique Identifiers for protective Name Spaces SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns. We will build a Linked Data Layer - using tools where they make sense SPARQL - SPARQL (pronounced "sparkle", a recursive acronym[2] for SPARQL Protocol and RDF Query Language) is an RDF query language, that is, a semantic query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.[3][4]