SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
© 2009 Illumina, Inc. All rights reserved.
Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb,
iSelect, CSPro, and GenomeStudio are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genome Informatics
Alliance 2013
Defining Genomic Big Data
and its Impact on Scientific
Progress
2
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
From Whence We Came…
ATGCCGTTT…
CCGGTTAAT…
GAATTGCAG…
6:A2567C
12:C123T
20:T4678A
30-40TB
˜5TB
600GB
˜20GB
3
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genomic Big Data
Large amounts of data generated in genomics; multiple
samples, size of data, etc
Integration of digital data to enrich context of samples;
DNA, RNA, methylation, time courses, spatial
distributions with samples, …
Fusion of digital data and categorical data; combination
rules (categories), extraction from unstructured inputs,
…
Tools and techniques appropriate for resultant data
sets; visualization, model building, exploration, …
Advances require data mining rather than the one-at-a-
time hypothesis testing approaches of today
4
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genomic Big Data and Personal Genome Information
PERSONAL SEQUENCE
(owned by individual/doctor)
Issued: 01 MAR 07 Recommended next check: 28 FEB 10
PGI id: 5910322 – 61215923014
RISK VARIANTS
(approved for clinical use)
Human Genome
Clinical studies Populations
SequencingFunctional annotation
3: 12,300 3: 12,400 ( kb )
PPARg
GENOMIC ANNOTATION
(in public domain)
Variant: C3 : 12,450,610 : T0.7/C0.3 :
PPARG : Pro/Leu :
Medical
consequence:
Associated with severe insulin
resistance, diabetes mellitus,
hypertension
Pharmacological
consequence:
Resistant to thiazolidinediones
CLINICAL DECISION
Consultation
Consent
Clinical assessment
Selected risk
information
5
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Sequencing a 17-member three-generation
pedigree.
– Ultra deep sequencing improves sensitivity
– Leveraging inheritance information improves
accuracy
– Data and results made publicly available
Identifying ultra accurate genomic variants is
enabling rapid improvements in technology
and software
This data will allow us to assess accuracy for
many FDA submissions
We are collaborating with NIST & CDC to
develop a public resource for quantifying
sequencing accuracy
Platinum Genomes as a Truth Reference
Creating a catalogue of highly-accurate SNPs, indels & SVs
6
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Reduction from 40 Q-scores to 8 Q-scores becoming accepted
Sequencing output is still increasing exponentially therefore further
compression is likely to be required
Platinum genome work suggest ~95% of genome is consistently called (this
95% is known as the platinum regions)
Regions which are reliably called may not need 8 Q-scores resolution
– we can reduce “well
sequenced” regions to 2 Q-
scores
Start with 8 Q-score bam file:
– Reduce the platinum regions
to 2 Q-scores (keep non-
platinum at 8 Q-scores)
– Reduce the platinum regions
to 1 Q-score
– Whole genome
2 Q-score
– Reduce platinum region to 2
Q-scores but also keep
original Q-scores of
mismatches (MM) and
anomalous reads
– ~40Gb (20Gb CRAM)
Data Reduction Via Vertical Compression (NA12882)
Build Total SNPs
(>Q20)
SNPs diff
genotype
(>Q20)
Not called in
Q-score
compressed
build
(>Q20)
Not called in 8
Q-score build
(>Q20)
8 Q-score 3,735,575
(3,627,165)
- - -
8 Q-score
technical
replicate
3,734,849
(3,626,485)
45,584
(22,400)
80,131 (29,211) 79,405 (28,845)
Platinum
Genome 2 Q-
score
3,732,568
(3,620,612)
3,255 (161) 3417 (63) 410 (127)
Platinum
Genome 1 Q-
score
3,764,928
(3,626,468)
4002 (584) 2605 (75) 31,958 (2964)
Whole Genome
2 Q-score
3,712,636
(3,598,400)
25,175 (1912) 24,237 (166) 1298 (112)
Platinum 2 q-
score keep MM
and anom.
reads
3,735,684
(3,627,226)
197 (123) 142 (35) 251 (102)
7
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Faster Data – DNA to Result in <2 Days
12 core server
64Gb RAM
Sequence Analyze AnnotateSample
27 hr 8 hr
HiSeq2500 Isaac analysis overnight
40 hr
Fast turnaround is required for clinical applications
4.5 hr
PCR Free library
8
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
WGS reveals somatic mutations in TERT
gene promoter of melanoma patients
Form a novel transcription factor binding
motif
Recurrence in melanoma is as high as
any known coding mutation
Importance of Non-coding Mutations – Bigger Data!
-200 -100
TERT gene
0 +100 +200
Gene (mutation) Incidence in
melanoma
TERT (promoter) 52%
BRAF (V600E) 53%
CDKN2A 50%
NRAS (Q61R) 28%
TERT (coding) 1%
Horn et al. & Huang et al., Science 2013
9
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Complexity of Data
10
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Surveillance of Leukaemia (CLL) – More Data Complexity!
0 6463 65 6662
Event
Timeline
Sequencing
Birth DeathTreatmentDiagnosis TreatmentTreatment
0
50
100
150
200
250
a b c d e
NORMAL
CLASS 4
CLASS 3
CLASS 2
CLASS 1
Time points
Abundance
Changing
subclonal
populations
0
1
2
3
4
5
c
NO
CL
CL
CL
CL
“Remission” has
disease
Schuh et al., Oxford
11
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
A Deeper Complexity of Genomic Data
12
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Utility Requires Complex Composite Information
iPad
Plug and Play
Cloud
Allele Frequency
in populations
www.1000genomes.org
Medical/Risk data
(with expert review)
Hgmd, pharmgkb
Genetic Variants
dbSNP
Functional Effects
ensembl.org,
genome.ucsc.edu,
encode.org
Disease association
genome.gov
ANNOTATED
GENOME
( gVCF)
<1Gbyte
Ancestry
Tissue type
Risk
Carrier status
Diagnosis
Drug
response
Annotate DisseminateInterpret
13
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Apps
Public Genomic Databases
Users
EMR
Support & Engineering
Instruments
Genomic Big Data Ecosystems
14
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genomic Big Data Status
Researcher
Treatment choice
Clinician
Patient
Knowledge
Information
15
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Challenges for this Meeting to Address
What data frameworks and models
are required?
How will genomes (DNA, RNA,
methylation states, etc) be
aggregated and compared?
How will collaboration and data
sharing evolve?
Where will the technology go and
how must the community respond
to lever the benefits
Brainstorming of ideas
Sessions from groups that have
experiences from many fields
Next steps!!
Actively participate and enjoy the entire
experience!

Weitere ähnliche Inhalte

Was ist angesagt?

Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglySciBite Limited
 
Air Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and EfficiencyAir Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and EfficiencyCAREL Industries S.p.A
 
An Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqAn Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqGolden Helix
 
In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment Covance
 
Evaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalEvaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalGolden Helix
 
Presentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGPresentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGThermo Fisher Scientific
 
Resume_Bill_Martinez
Resume_Bill_MartinezResume_Bill_Martinez
Resume_Bill_MartinezBill Martinez
 
CNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysisCNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysisGolden Helix
 
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline WorkflowEvaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline WorkflowGolden Helix
 

Was ist angesagt? (9)

Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The Ugly
 
Air Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and EfficiencyAir Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and Efficiency
 
An Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqAn Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeq
 
In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment
 
Evaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalEvaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinical
 
Presentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGPresentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAG
 
Resume_Bill_Martinez
Resume_Bill_MartinezResume_Bill_Martinez
Resume_Bill_Martinez
 
CNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysisCNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysis
 
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline WorkflowEvaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
 

Andere mochten auch

台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路交點
 
My Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad PopovichMy Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad PopovichCityAge
 
Guia de estudio ser estar
Guia de estudio ser estarGuia de estudio ser estar
Guia de estudio ser estarAna
 
Blogging for Accountants & Advisors
Blogging for Accountants & AdvisorsBlogging for Accountants & Advisors
Blogging for Accountants & AdvisorsPractice Paradox
 
Experience at NSL Chemical
Experience at NSL ChemicalExperience at NSL Chemical
Experience at NSL ChemicalTan Ray
 
The new breaking news medium
The new breaking news mediumThe new breaking news medium
The new breaking news mediumRoshan Mastana
 
La educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en limaLa educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en limaElizabeth Huisa Veria
 
Google Analytics and Sungard HE Luminis
Google Analytics and Sungard HE LuminisGoogle Analytics and Sungard HE Luminis
Google Analytics and Sungard HE LuminisDavid Simpson
 
The Dark Side of Famous Writers
The Dark Side of Famous WritersThe Dark Side of Famous Writers
The Dark Side of Famous WritersESSAYSHARK.com
 
Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?Mashable
 
Group facilitation types_of_groups
Group facilitation types_of_groupsGroup facilitation types_of_groups
Group facilitation types_of_groupsNeeraj Saini
 

Andere mochten auch (18)

Daily Newsletter: 16th May, 2011
Daily Newsletter: 16th May, 2011Daily Newsletter: 16th May, 2011
Daily Newsletter: 16th May, 2011
 
台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路
 
My Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad PopovichMy Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad Popovich
 
Horario 8º semestre
Horario  8º semestreHorario  8º semestre
Horario 8º semestre
 
Guia de estudio ser estar
Guia de estudio ser estarGuia de estudio ser estar
Guia de estudio ser estar
 
Blogging for Accountants & Advisors
Blogging for Accountants & AdvisorsBlogging for Accountants & Advisors
Blogging for Accountants & Advisors
 
Experience at NSL Chemical
Experience at NSL ChemicalExperience at NSL Chemical
Experience at NSL Chemical
 
The new breaking news medium
The new breaking news mediumThe new breaking news medium
The new breaking news medium
 
Adultos Mayores.
Adultos Mayores.Adultos Mayores.
Adultos Mayores.
 
Five Easy Casserole Recipes
Five Easy Casserole RecipesFive Easy Casserole Recipes
Five Easy Casserole Recipes
 
La educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en limaLa educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en lima
 
Google Analytics and Sungard HE Luminis
Google Analytics and Sungard HE LuminisGoogle Analytics and Sungard HE Luminis
Google Analytics and Sungard HE Luminis
 
The Dark Side of Famous Writers
The Dark Side of Famous WritersThe Dark Side of Famous Writers
The Dark Side of Famous Writers
 
Enlace quimico daniel
Enlace quimico danielEnlace quimico daniel
Enlace quimico daniel
 
Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?
 
Group facilitation types_of_groups
Group facilitation types_of_groupsGroup facilitation types_of_groups
Group facilitation types_of_groups
 
Home DIYs That Smell Good
Home DIYs That Smell GoodHome DIYs That Smell Good
Home DIYs That Smell Good
 
SOP CV
SOP CVSOP CV
SOP CV
 

Ähnlich wie Scott Kahn Genomic Big Data.gia.052913

Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Docker, Inc.
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Groupnist-spin
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...Ilya Klabukov
 
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...InsideScientific
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Pistoia Alliance
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyGenotypic Technology
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ ManchesterAndre Freitas
 
openarray_product Bulletin
openarray_product Bulletinopenarray_product Bulletin
openarray_product BulletinAmanda Eberle
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
How we've made a global search engine for genetic data
How we've made a global search engine for genetic dataHow we've made a global search engine for genetic data
How we've made a global search engine for genetic dataMiro Cupak
 
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...FOODCROPS
 

Ähnlich wie Scott Kahn Genomic Big Data.gia.052913 (20)

Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Group
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
 
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic Technology
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
openarray_product Bulletin
openarray_product Bulletinopenarray_product Bulletin
openarray_product Bulletin
 
05 costa
05 costa05 costa
05 costa
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
How we've made a global search engine for genetic data
How we've made a global search engine for genetic dataHow we've made a global search engine for genetic data
How we've made a global search engine for genetic data
 
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Scott Kahn Genomic Big Data.gia.052913

  • 1. © 2009 Illumina, Inc. All rights reserved. Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb, iSelect, CSPro, and GenomeStudio are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genome Informatics Alliance 2013 Defining Genomic Big Data and its Impact on Scientific Progress
  • 2. 2 COMPANY CONFIDENTIAL – INTERNAL USE ONLY From Whence We Came… ATGCCGTTT… CCGGTTAAT… GAATTGCAG… 6:A2567C 12:C123T 20:T4678A 30-40TB ˜5TB 600GB ˜20GB
  • 3. 3 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genomic Big Data Large amounts of data generated in genomics; multiple samples, size of data, etc Integration of digital data to enrich context of samples; DNA, RNA, methylation, time courses, spatial distributions with samples, … Fusion of digital data and categorical data; combination rules (categories), extraction from unstructured inputs, … Tools and techniques appropriate for resultant data sets; visualization, model building, exploration, … Advances require data mining rather than the one-at-a- time hypothesis testing approaches of today
  • 4. 4 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genomic Big Data and Personal Genome Information PERSONAL SEQUENCE (owned by individual/doctor) Issued: 01 MAR 07 Recommended next check: 28 FEB 10 PGI id: 5910322 – 61215923014 RISK VARIANTS (approved for clinical use) Human Genome Clinical studies Populations SequencingFunctional annotation 3: 12,300 3: 12,400 ( kb ) PPARg GENOMIC ANNOTATION (in public domain) Variant: C3 : 12,450,610 : T0.7/C0.3 : PPARG : Pro/Leu : Medical consequence: Associated with severe insulin resistance, diabetes mellitus, hypertension Pharmacological consequence: Resistant to thiazolidinediones CLINICAL DECISION Consultation Consent Clinical assessment Selected risk information
  • 5. 5 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Sequencing a 17-member three-generation pedigree. – Ultra deep sequencing improves sensitivity – Leveraging inheritance information improves accuracy – Data and results made publicly available Identifying ultra accurate genomic variants is enabling rapid improvements in technology and software This data will allow us to assess accuracy for many FDA submissions We are collaborating with NIST & CDC to develop a public resource for quantifying sequencing accuracy Platinum Genomes as a Truth Reference Creating a catalogue of highly-accurate SNPs, indels & SVs
  • 6. 6 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Reduction from 40 Q-scores to 8 Q-scores becoming accepted Sequencing output is still increasing exponentially therefore further compression is likely to be required Platinum genome work suggest ~95% of genome is consistently called (this 95% is known as the platinum regions) Regions which are reliably called may not need 8 Q-scores resolution – we can reduce “well sequenced” regions to 2 Q- scores Start with 8 Q-score bam file: – Reduce the platinum regions to 2 Q-scores (keep non- platinum at 8 Q-scores) – Reduce the platinum regions to 1 Q-score – Whole genome 2 Q-score – Reduce platinum region to 2 Q-scores but also keep original Q-scores of mismatches (MM) and anomalous reads – ~40Gb (20Gb CRAM) Data Reduction Via Vertical Compression (NA12882) Build Total SNPs (>Q20) SNPs diff genotype (>Q20) Not called in Q-score compressed build (>Q20) Not called in 8 Q-score build (>Q20) 8 Q-score 3,735,575 (3,627,165) - - - 8 Q-score technical replicate 3,734,849 (3,626,485) 45,584 (22,400) 80,131 (29,211) 79,405 (28,845) Platinum Genome 2 Q- score 3,732,568 (3,620,612) 3,255 (161) 3417 (63) 410 (127) Platinum Genome 1 Q- score 3,764,928 (3,626,468) 4002 (584) 2605 (75) 31,958 (2964) Whole Genome 2 Q-score 3,712,636 (3,598,400) 25,175 (1912) 24,237 (166) 1298 (112) Platinum 2 q- score keep MM and anom. reads 3,735,684 (3,627,226) 197 (123) 142 (35) 251 (102)
  • 7. 7 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Faster Data – DNA to Result in <2 Days 12 core server 64Gb RAM Sequence Analyze AnnotateSample 27 hr 8 hr HiSeq2500 Isaac analysis overnight 40 hr Fast turnaround is required for clinical applications 4.5 hr PCR Free library
  • 8. 8 COMPANY CONFIDENTIAL – INTERNAL USE ONLY WGS reveals somatic mutations in TERT gene promoter of melanoma patients Form a novel transcription factor binding motif Recurrence in melanoma is as high as any known coding mutation Importance of Non-coding Mutations – Bigger Data! -200 -100 TERT gene 0 +100 +200 Gene (mutation) Incidence in melanoma TERT (promoter) 52% BRAF (V600E) 53% CDKN2A 50% NRAS (Q61R) 28% TERT (coding) 1% Horn et al. & Huang et al., Science 2013
  • 9. 9 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Complexity of Data
  • 10. 10 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Surveillance of Leukaemia (CLL) – More Data Complexity! 0 6463 65 6662 Event Timeline Sequencing Birth DeathTreatmentDiagnosis TreatmentTreatment 0 50 100 150 200 250 a b c d e NORMAL CLASS 4 CLASS 3 CLASS 2 CLASS 1 Time points Abundance Changing subclonal populations 0 1 2 3 4 5 c NO CL CL CL CL “Remission” has disease Schuh et al., Oxford
  • 11. 11 COMPANY CONFIDENTIAL – INTERNAL USE ONLY A Deeper Complexity of Genomic Data
  • 12. 12 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Utility Requires Complex Composite Information iPad Plug and Play Cloud Allele Frequency in populations www.1000genomes.org Medical/Risk data (with expert review) Hgmd, pharmgkb Genetic Variants dbSNP Functional Effects ensembl.org, genome.ucsc.edu, encode.org Disease association genome.gov ANNOTATED GENOME ( gVCF) <1Gbyte Ancestry Tissue type Risk Carrier status Diagnosis Drug response Annotate DisseminateInterpret
  • 13. 13 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Apps Public Genomic Databases Users EMR Support & Engineering Instruments Genomic Big Data Ecosystems
  • 14. 14 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genomic Big Data Status Researcher Treatment choice Clinician Patient Knowledge Information
  • 15. 15 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Challenges for this Meeting to Address What data frameworks and models are required? How will genomes (DNA, RNA, methylation states, etc) be aggregated and compared? How will collaboration and data sharing evolve? Where will the technology go and how must the community respond to lever the benefits Brainstorming of ideas Sessions from groups that have experiences from many fields Next steps!! Actively participate and enjoy the entire experience!