SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Agenda:
• Research Computing @ Arizona State University
• Program, Vision and Mission
• Emphasis on Open Source
• Evolution in Genomic Analysis (HPC > MRv2 > Spark)
J.A. Etchings
RC@ASU Innovation
2
Arizona State University has become the foundational
model for the “New American University”, a new
paradigm for the public Research University that
transforms higher education. ASU is committed to
Excellence, Access and Impact in everything that it
does.
Open-source Data Driven Infrastructure
Google Open-source Function
GFS HDFS Distributed file system
MapReduce MapReduce Batch distributed data processing
Bigtable HBase Distributed DB/key-value store
Protobuf/Stubby Thrift & Avro Data serialization/RPC
Pregel Giraph Distributed graph processing
Dremel/F1 Impala Scalable interactive SQL (MPP)
FlumeJava Crunch Abstracted data pipelines on Hadoop
In Memory Spark In Memory Computation
Data Intensive
TransCORE Framework Knowledge Engine
Context
Ontologies
Data Elements
Information Models
Middleware
Transact
Clinical Research
Life Science Research
Qualitative Research
Analytic
In-Memory Analysis
Genomic Data
Machine Learning
Meta-Data Management
Data Resources
Open Big Data File System
Relational Key/Value
HPC Parallel
HPC SMA
Transactional
Data Reservoir
Big DataScratch Space
Internet 2 / SDN Connectivity
The entire human genome of a single man
3 billion letters, 262,000 printed pages, 3.3GB
@rikisabatini #TED2016
Clarification & Limitations :
• Yes, we can sequence a Genome for $1000
– Unfortunately, this does not include analysis
• There are 3 billion diploid basepairs, but 6 billion haploid sequences
– Half come from mom and half from dad, and assembling those haplotypes - especially SNPs that are the same
haplotype - is going to instrumental in future medical advances
• Other limitations:
– batch effects (in physical sequencing, in sequencing technology
– Different software, different versions of software, and infrastructure (Standardization Gap)
– Batch effects can significantly impede variant discovery (false positives are high)
“NEED TO FOCUS NOT ON BIG DATA,
BUT BIG ANSWERS”
Harper Reed – CTO Obama for America 2012
Tumors are not composed of identical cells:
There is likely extreme intratumor heterogeneity
Macro heterogeneity
> 10 % frequency in the tumor
Micro heterogeneity
< 10 % frequency in
the tumor
• What are the population dynamics of cancer cell populations?
• What is the role of genetic drift in cancer initiation and progression?
• What is the extent of subclonal variation within a tumor at the time of
diagnosis?
• Are resistant subclones present in a tumor before the start of
therapy?
Use simulations to ask:
Model parameters and their values
• Probability of division, bn, which depends on the fitness of each cell
• Mean selection coefficient, 𝑠 , to generate the exponential distribution of selection
coefficients
𝑠 = [ 0.1; 0.01; 0.005 ]
• Average driver mutation rate per cell division, 𝑢
𝑢 = [ 10−8; 10−7; 10−6; 10−5 ]
• Generation time: average division time = 4 days*
*S Jones et al. Comparative lesion sequencing provides insights into tumor evolution. PNAS (2008)
The model: A branching evolutionary process
Death
Division
Division + driver mutation
The process starts in
a single cell with one
driver mutation
OR
OR
1-bn
(1-u)bn
ubn
years later
Driver mutation
arises
A clone develops Neoplastic
progression starts
years later
The model: A branching evolutionary process
≈ 98% of starting mutant clones die out early
Mean selection
coefficient
Driver mutation rate
per cell division
Number of
realizations
Number of
realizations
that reached
109 cells
Percentage of
realizations
that reached
109 cells (%)
Average time
to detection
(years)
0.1 10155 162 1.6% 17.50
0.1 1948 112 5.7% 5.21
0.1 748 134 17.9% 1.74
0.1 748 111 14.8% 1.62
0.01 6867 125 1.8% 19.80
0.01 6866 113 1.6% 15.41
0.01 6866 120 1.7% 13.85
0.01 6865 115 1.7% 11.16
0.005 11951 102 0.9% 27.97
0.005 11751 112 1.0% 27.91
0.005 11750 126 1.1% 22.43
0.005 11750 100 0.9% 18.28
completed 88265 1432 1.6%
Some tumors develop very quickly
(mimics childhood cancers)
Mean selection
coefficient
Driver mutation rate
per cell division
Number of
realizations
Number of
realizations
that reached
109 cells
Percentage of
realizations
that reached
109 cells (%)
Average time
to detection
(years)
0.1 10155 162 1.6% 17.50
0.1 1948 112 5.7% 5.21
0.1 748 134 17.9% 1.74
0.1 748 111 14.8% 1.62
0.01 6867 125 1.8% 19.80
0.01 6866 113 1.6% 15.41
0.01 6866 120 1.7% 13.85
0.01 6865 115 1.7% 11.16
0.005 11951 102 0.9% 27.97
0.005 11751 112 1.0% 27.91
0.005 11750 126 1.1% 22.43
0.005 11750 100 0.9% 18.28
completed 88265 1432 1.6%
Some tumors take decades to develop
(mimics many adult cancers, like melanoma)
Mean selection
coefficient
Driver mutation rate
per cell division
Number of
realizations
Number of
realizations
that reached
109 cells
Percentage of
realizations
that reached
109 cells (%)
Average time
to detection
(years)
0.1 10155 162 1.6% 17.50
0.1 1948 112 5.7% 5.21
0.1 748 134 17.9% 1.74
0.1 748 111 14.8% 1.62
0.01 6867 125 1.8% 19.80
0.01 6866 113 1.6% 15.41
0.01 6866 120 1.7% 13.85
0.01 6865 115 1.7% 11.16
0.005 11951 102 0.9% 27.97
0.005 11751 112 1.0% 27.91
0.005 11750 126 1.1% 22.43
0.005 11750 100 0.9% 18.28
completed 88265 1432 1.6%
Computationally Intensive
• Running until 10-9 cells was not efficient on a laptop
• Most tumors die out before reaching a detectable limit
• Need to reduce run-time, track all mutations, and
subclone sizes (Massively)
eQTL Analysis
Generation trillions of hypothesis tests
• 107 loci x 104 phenotypes x 10s of tissues = 1012 p-values
• Tested below on 120 billion associations
Example queries:
• “Given 5 genes of interest, find top 20 most significant eQTLs (cis and/or trans)”
o Finishes in several seconds
• “Find all cis-eQTLs across the entire genome”
o Finishes in a couple of minutes
o Limited by disk throughput
862
306
473
168
404
138
776
308
474
166
387
136
700
192
332
125
240
119
0
100
200
300
400
500
600
700
800
900
1000
eQTL-Cases eQTL-Controls eQTL-Cases eQTL-Controls eQTL-Cases eQTL-Controls
5 10 15
Cloudera
Hortonworks
MapR
Time taken in minutes
Number of Cores
Map Reduce
HPC
Apache Spark
• Took a day to get a tumor to 10-7
– (still 2 orders of magnitude too small)
• Convert code from MatLab to Scala (Spark)
• Takes seconds to simulate a single tumor
• Ability to generate tens of thousands of possible tumors, and
thousands of measurable tumors, observed dynamics
Standard Output
0.00.20.4
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.20.40.6
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.8
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.81.2
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.20.4
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.8
Subclone size (number of cells)Density
10
2
10
4
10
6
10
8
10
10
0.00.40.81.2
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.81.2
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.81.2
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.8
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.81.2
Subclone size (number of cells)Density
10
2
10
4
10
6
10
8
10
10
0.00.40.81.2
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
𝑠 = 0.1, μd = 10-8
𝑠 = 0.01, μd = 10-8
𝑠 = 0.005, μd = 10-8
𝑠 = 0.1, μd = 10-7
𝑠 = 0.01, μd = 10-7
𝑠 = 0.005, μd = 10-7
𝑠 = 0.1, μd = 10-6
𝑠 = 0.01, μd = 10-6
𝑠 = 0.005, μd = 10-6
𝑠 = 0.1, μd = 10-5
𝑠 = 0.01, μd = 10-5
𝑠 = 0.005, μd = 10-5
N = 162 N = 112 N = 134 N = 111
N = 125 N = 113 N = 120 N = 115
N = 102 N = 112 N = 126 N = 100
DensityDensityDensity
Subclone size
(number of cells)
Subclone size (number of cells)Subclone size
(number of cells)
Subclone size
(number of cells)
Subclone size
(number of cells)
0.00.40.8
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.8
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.8
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.81.2
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.8
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
0.00.40.8
Subclone size (number of cells)
Density
10
2
10
4
10
6
10
8
10
10
N = 111
N = 115
N = 100
N = 134
N = 120
N = 126
𝑠 = 0.1, μd = 10-6 𝑠 = 0.1, μd = 10-5
𝑠 = 0.01, μd = 10-6 𝑠 = 0.01, μd = 10-5
𝑠 = 0.005, μd = 10-6 𝑠 = 0.005, μd = 10-5
Resistant subclone size (number of cells) Resistant subclone size (number of cells)
DensityDensityDensity
Standard Output
41%
1 driver mutations
10%
2 driver mutations
19%
2 driver mutations
Output to Tableau
Minor subclones that harbor mutations resistant to treatment
can result in relapse
4 months on drug 6 months on drug
N. Wagle et al., Journal of Clinical of Oncology (2011)
Response to
vemurafenib
(V600E BRAF
inhibitors)
Subclonal variation of simulated tumor-1 at diagnosis
𝑠 = 0.005, u =10−5
per cell division, and mean division time = 4 daysNumberofcells
Subclonal compositionPopulation dynamics of cancer cells
subclone with a
resistance mutation
N = 2,682 cells
Resistant mutation rate =
17%
1 driver mutation
80%
2 driver mutations
Time (years)
Subclonal variation of simulated tumor-2 at diagnosis
Numberofcells
Time (years)
Subclonal composition
𝑠 = 0.01, u =10−5
per cell division, and mean division time = 4 days
19%
2 driver
mutations
10%
2 driver mutations
41%
1 driver mutations
subclone with a resistance mutation
N = 224,502 cells
Resistant mutation rate = 𝟏𝟎−𝟖
Population dynamics of cancer cells
Conclusions:
• These results constitute an argument for the development and application of more sensitive
technologies for the detection of rare pre-existing subclones that might plant the seeds for
rapid clinical relapse.
• Based on the predicted extent of standing subclonal variation, drug-resistant subclones are
almost certain to exist before the initiation of treatment initiation.
• Greater subclonal diversity in a tumor may predict a higher likelihood of pre-existing
resistance to any conceivable targeted therapy
• Subclonal diversity itself may be a marker of the potential to evolve drug resistance, and
therefore may be an important prognostic indicator
• Reducing the time to research output with Apache Spark increases the success probability of
targeted therapies
The extent of subclonal variation is predicted by number of distinct dominant clones
Diego Chowella,b, James Napierc, Rohan Guptac, Karen S. Andersonb,d, Carlo C. Maleyb,d,f,1, and Melissa A. Wilson
Sayresb,d,e,1
aMathematical, Computational and Modeling Sciences Center, bBiodesign Institute, cResearch Computing Center,
dSchool of Life Sciences, eCenter for Evolution and Medicine, Arizona State University, Tempe, Arizona 85281,
USA, fCenter for Evolution and Cancer, University of California San Francisco, San Francisco, California 94158,
USA
1To whom correspondence may be addressed
E-mail: maley@asu.edu or melissa.wilsonsayres@asu.edu (wilsonsayreslab.org | @mwilsonsayres )
ASU

Weitere ähnliche Inhalte

Ähnlich wie ASU

High Throughput Investigation of EC Coupling in Isolated Cardiac Myocytes
High Throughput Investigation of EC Coupling in Isolated Cardiac MyocytesHigh Throughput Investigation of EC Coupling in Isolated Cardiac Myocytes
High Throughput Investigation of EC Coupling in Isolated Cardiac MyocytesInsideScientific
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Robert (Rob) Salomon
 
Fluorescence- Activated Cell Sorter (FACS)
Fluorescence- Activated Cell Sorter (FACS)Fluorescence- Activated Cell Sorter (FACS)
Fluorescence- Activated Cell Sorter (FACS)Nidhi Parikh
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyWookjin Choi
 
Data Science, Big Data and You
Data Science, Big Data and YouData Science, Big Data and You
Data Science, Big Data and YouJoel Saltz
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesMohamed Loey
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsGenomeInABottle
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyQIAGEN
 
K Marcoe In Cell User Ge Meeting 2008
K Marcoe In Cell User Ge Meeting  2008K Marcoe In Cell User Ge Meeting  2008
K Marcoe In Cell User Ge Meeting 2008KarenMarcoe
 
Accurate Cell Counters for CAR T Therapy
Accurate Cell Counters for  CAR T TherapyAccurate Cell Counters for  CAR T Therapy
Accurate Cell Counters for CAR T TherapyNexcelom-Bioscience
 
Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)Sean Yu
 
Deep Learning for AI (3)
Deep Learning for AI (3)Deep Learning for AI (3)
Deep Learning for AI (3)Dongheon Lee
 

Ähnlich wie ASU (20)

High Throughput Investigation of EC Coupling in Isolated Cardiac Myocytes
High Throughput Investigation of EC Coupling in Isolated Cardiac MyocytesHigh Throughput Investigation of EC Coupling in Isolated Cardiac Myocytes
High Throughput Investigation of EC Coupling in Isolated Cardiac Myocytes
 
Ouellette icgc toronto_oct2012_fged_ver02
Ouellette icgc toronto_oct2012_fged_ver02Ouellette icgc toronto_oct2012_fged_ver02
Ouellette icgc toronto_oct2012_fged_ver02
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Mason abrf single_cell_2017
Mason abrf single_cell_2017Mason abrf single_cell_2017
Mason abrf single_cell_2017
 
UNMSymposium2014
UNMSymposium2014UNMSymposium2014
UNMSymposium2014
 
Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1
 
Fluorescence- Activated Cell Sorter (FACS)
Fluorescence- Activated Cell Sorter (FACS)Fluorescence- Activated Cell Sorter (FACS)
Fluorescence- Activated Cell Sorter (FACS)
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation Oncology
 
Data Science, Big Data and You
Data Science, Big Data and YouData Science, Big Data and You
Data Science, Big Data and You
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
 
Nanotechnology in Cancer - Dr. Cote
Nanotechnology in Cancer - Dr. CoteNanotechnology in Cancer - Dr. Cote
Nanotechnology in Cancer - Dr. Cote
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigentics
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
 
K Marcoe In Cell User Ge Meeting 2008
K Marcoe In Cell User Ge Meeting  2008K Marcoe In Cell User Ge Meeting  2008
K Marcoe In Cell User Ge Meeting 2008
 
Accurate Cell Counters for CAR T Therapy
Accurate Cell Counters for  CAR T TherapyAccurate Cell Counters for  CAR T Therapy
Accurate Cell Counters for CAR T Therapy
 
Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)
 
Deep Learning for AI (3)
Deep Learning for AI (3)Deep Learning for AI (3)
Deep Learning for AI (3)
 

Mehr von DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mehr von DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Kürzlich hochgeladen

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

ASU

  • 1. Agenda: • Research Computing @ Arizona State University • Program, Vision and Mission • Emphasis on Open Source • Evolution in Genomic Analysis (HPC > MRv2 > Spark) J.A. Etchings RC@ASU Innovation
  • 2. 2 Arizona State University has become the foundational model for the “New American University”, a new paradigm for the public Research University that transforms higher education. ASU is committed to Excellence, Access and Impact in everything that it does.
  • 3.
  • 4. Open-source Data Driven Infrastructure Google Open-source Function GFS HDFS Distributed file system MapReduce MapReduce Batch distributed data processing Bigtable HBase Distributed DB/key-value store Protobuf/Stubby Thrift & Avro Data serialization/RPC Pregel Giraph Distributed graph processing Dremel/F1 Impala Scalable interactive SQL (MPP) FlumeJava Crunch Abstracted data pipelines on Hadoop In Memory Spark In Memory Computation Data Intensive
  • 5. TransCORE Framework Knowledge Engine Context Ontologies Data Elements Information Models Middleware Transact Clinical Research Life Science Research Qualitative Research Analytic In-Memory Analysis Genomic Data Machine Learning Meta-Data Management Data Resources Open Big Data File System Relational Key/Value HPC Parallel HPC SMA Transactional Data Reservoir Big DataScratch Space Internet 2 / SDN Connectivity
  • 6.
  • 7. The entire human genome of a single man 3 billion letters, 262,000 printed pages, 3.3GB @rikisabatini #TED2016
  • 8. Clarification & Limitations : • Yes, we can sequence a Genome for $1000 – Unfortunately, this does not include analysis • There are 3 billion diploid basepairs, but 6 billion haploid sequences – Half come from mom and half from dad, and assembling those haplotypes - especially SNPs that are the same haplotype - is going to instrumental in future medical advances • Other limitations: – batch effects (in physical sequencing, in sequencing technology – Different software, different versions of software, and infrastructure (Standardization Gap) – Batch effects can significantly impede variant discovery (false positives are high)
  • 9. “NEED TO FOCUS NOT ON BIG DATA, BUT BIG ANSWERS” Harper Reed – CTO Obama for America 2012
  • 10. Tumors are not composed of identical cells: There is likely extreme intratumor heterogeneity Macro heterogeneity > 10 % frequency in the tumor Micro heterogeneity < 10 % frequency in the tumor
  • 11. • What are the population dynamics of cancer cell populations? • What is the role of genetic drift in cancer initiation and progression? • What is the extent of subclonal variation within a tumor at the time of diagnosis? • Are resistant subclones present in a tumor before the start of therapy? Use simulations to ask:
  • 12. Model parameters and their values • Probability of division, bn, which depends on the fitness of each cell • Mean selection coefficient, 𝑠 , to generate the exponential distribution of selection coefficients 𝑠 = [ 0.1; 0.01; 0.005 ] • Average driver mutation rate per cell division, 𝑢 𝑢 = [ 10−8; 10−7; 10−6; 10−5 ] • Generation time: average division time = 4 days* *S Jones et al. Comparative lesion sequencing provides insights into tumor evolution. PNAS (2008)
  • 13. The model: A branching evolutionary process Death Division Division + driver mutation The process starts in a single cell with one driver mutation OR OR 1-bn (1-u)bn ubn
  • 14. years later Driver mutation arises A clone develops Neoplastic progression starts years later The model: A branching evolutionary process
  • 15. ≈ 98% of starting mutant clones die out early Mean selection coefficient Driver mutation rate per cell division Number of realizations Number of realizations that reached 109 cells Percentage of realizations that reached 109 cells (%) Average time to detection (years) 0.1 10155 162 1.6% 17.50 0.1 1948 112 5.7% 5.21 0.1 748 134 17.9% 1.74 0.1 748 111 14.8% 1.62 0.01 6867 125 1.8% 19.80 0.01 6866 113 1.6% 15.41 0.01 6866 120 1.7% 13.85 0.01 6865 115 1.7% 11.16 0.005 11951 102 0.9% 27.97 0.005 11751 112 1.0% 27.91 0.005 11750 126 1.1% 22.43 0.005 11750 100 0.9% 18.28 completed 88265 1432 1.6%
  • 16. Some tumors develop very quickly (mimics childhood cancers) Mean selection coefficient Driver mutation rate per cell division Number of realizations Number of realizations that reached 109 cells Percentage of realizations that reached 109 cells (%) Average time to detection (years) 0.1 10155 162 1.6% 17.50 0.1 1948 112 5.7% 5.21 0.1 748 134 17.9% 1.74 0.1 748 111 14.8% 1.62 0.01 6867 125 1.8% 19.80 0.01 6866 113 1.6% 15.41 0.01 6866 120 1.7% 13.85 0.01 6865 115 1.7% 11.16 0.005 11951 102 0.9% 27.97 0.005 11751 112 1.0% 27.91 0.005 11750 126 1.1% 22.43 0.005 11750 100 0.9% 18.28 completed 88265 1432 1.6%
  • 17. Some tumors take decades to develop (mimics many adult cancers, like melanoma) Mean selection coefficient Driver mutation rate per cell division Number of realizations Number of realizations that reached 109 cells Percentage of realizations that reached 109 cells (%) Average time to detection (years) 0.1 10155 162 1.6% 17.50 0.1 1948 112 5.7% 5.21 0.1 748 134 17.9% 1.74 0.1 748 111 14.8% 1.62 0.01 6867 125 1.8% 19.80 0.01 6866 113 1.6% 15.41 0.01 6866 120 1.7% 13.85 0.01 6865 115 1.7% 11.16 0.005 11951 102 0.9% 27.97 0.005 11751 112 1.0% 27.91 0.005 11750 126 1.1% 22.43 0.005 11750 100 0.9% 18.28 completed 88265 1432 1.6%
  • 18. Computationally Intensive • Running until 10-9 cells was not efficient on a laptop • Most tumors die out before reaching a detectable limit • Need to reduce run-time, track all mutations, and subclone sizes (Massively)
  • 19. eQTL Analysis Generation trillions of hypothesis tests • 107 loci x 104 phenotypes x 10s of tissues = 1012 p-values • Tested below on 120 billion associations Example queries: • “Given 5 genes of interest, find top 20 most significant eQTLs (cis and/or trans)” o Finishes in several seconds • “Find all cis-eQTLs across the entire genome” o Finishes in a couple of minutes o Limited by disk throughput
  • 20. 862 306 473 168 404 138 776 308 474 166 387 136 700 192 332 125 240 119 0 100 200 300 400 500 600 700 800 900 1000 eQTL-Cases eQTL-Controls eQTL-Cases eQTL-Controls eQTL-Cases eQTL-Controls 5 10 15 Cloudera Hortonworks MapR Time taken in minutes Number of Cores Map Reduce HPC Apache Spark
  • 21. • Took a day to get a tumor to 10-7 – (still 2 orders of magnitude too small) • Convert code from MatLab to Scala (Spark) • Takes seconds to simulate a single tumor • Ability to generate tens of thousands of possible tumors, and thousands of measurable tumors, observed dynamics
  • 22. Standard Output 0.00.20.4 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.20.40.6 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.8 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.81.2 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.20.4 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.8 Subclone size (number of cells)Density 10 2 10 4 10 6 10 8 10 10 0.00.40.81.2 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.81.2 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.81.2 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.8 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.81.2 Subclone size (number of cells)Density 10 2 10 4 10 6 10 8 10 10 0.00.40.81.2 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 𝑠 = 0.1, μd = 10-8 𝑠 = 0.01, μd = 10-8 𝑠 = 0.005, μd = 10-8 𝑠 = 0.1, μd = 10-7 𝑠 = 0.01, μd = 10-7 𝑠 = 0.005, μd = 10-7 𝑠 = 0.1, μd = 10-6 𝑠 = 0.01, μd = 10-6 𝑠 = 0.005, μd = 10-6 𝑠 = 0.1, μd = 10-5 𝑠 = 0.01, μd = 10-5 𝑠 = 0.005, μd = 10-5 N = 162 N = 112 N = 134 N = 111 N = 125 N = 113 N = 120 N = 115 N = 102 N = 112 N = 126 N = 100 DensityDensityDensity Subclone size (number of cells) Subclone size (number of cells)Subclone size (number of cells) Subclone size (number of cells) Subclone size (number of cells)
  • 23. 0.00.40.8 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.8 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.8 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.81.2 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.8 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 0.00.40.8 Subclone size (number of cells) Density 10 2 10 4 10 6 10 8 10 10 N = 111 N = 115 N = 100 N = 134 N = 120 N = 126 𝑠 = 0.1, μd = 10-6 𝑠 = 0.1, μd = 10-5 𝑠 = 0.01, μd = 10-6 𝑠 = 0.01, μd = 10-5 𝑠 = 0.005, μd = 10-6 𝑠 = 0.005, μd = 10-5 Resistant subclone size (number of cells) Resistant subclone size (number of cells) DensityDensityDensity Standard Output
  • 24. 41% 1 driver mutations 10% 2 driver mutations 19% 2 driver mutations Output to Tableau
  • 25. Minor subclones that harbor mutations resistant to treatment can result in relapse 4 months on drug 6 months on drug N. Wagle et al., Journal of Clinical of Oncology (2011) Response to vemurafenib (V600E BRAF inhibitors)
  • 26. Subclonal variation of simulated tumor-1 at diagnosis 𝑠 = 0.005, u =10−5 per cell division, and mean division time = 4 daysNumberofcells Subclonal compositionPopulation dynamics of cancer cells subclone with a resistance mutation N = 2,682 cells Resistant mutation rate = 17% 1 driver mutation 80% 2 driver mutations Time (years)
  • 27. Subclonal variation of simulated tumor-2 at diagnosis Numberofcells Time (years) Subclonal composition 𝑠 = 0.01, u =10−5 per cell division, and mean division time = 4 days 19% 2 driver mutations 10% 2 driver mutations 41% 1 driver mutations subclone with a resistance mutation N = 224,502 cells Resistant mutation rate = 𝟏𝟎−𝟖 Population dynamics of cancer cells
  • 28.
  • 29. Conclusions: • These results constitute an argument for the development and application of more sensitive technologies for the detection of rare pre-existing subclones that might plant the seeds for rapid clinical relapse. • Based on the predicted extent of standing subclonal variation, drug-resistant subclones are almost certain to exist before the initiation of treatment initiation. • Greater subclonal diversity in a tumor may predict a higher likelihood of pre-existing resistance to any conceivable targeted therapy • Subclonal diversity itself may be a marker of the potential to evolve drug resistance, and therefore may be an important prognostic indicator • Reducing the time to research output with Apache Spark increases the success probability of targeted therapies
  • 30. The extent of subclonal variation is predicted by number of distinct dominant clones Diego Chowella,b, James Napierc, Rohan Guptac, Karen S. Andersonb,d, Carlo C. Maleyb,d,f,1, and Melissa A. Wilson Sayresb,d,e,1 aMathematical, Computational and Modeling Sciences Center, bBiodesign Institute, cResearch Computing Center, dSchool of Life Sciences, eCenter for Evolution and Medicine, Arizona State University, Tempe, Arizona 85281, USA, fCenter for Evolution and Cancer, University of California San Francisco, San Francisco, California 94158, USA 1To whom correspondence may be addressed E-mail: maley@asu.edu or melissa.wilsonsayres@asu.edu (wilsonsayreslab.org | @mwilsonsayres )

Hinweis der Redaktion

  1. Quick Facts: Founded in 1885 as the Territorial Normal School Renamed to Arizona State University in 1958 In 1994 ASU was classified as a Research I institute Largest public university in the United States by enrollment 83K Students enrolled in Academic year 2013-2014 20K Degrees completed Ranked #4 in the world for US patents in universities w/o a medical school Research Expenditures = $405 Million in 2013 Currently, Arizona State University is ranked among the Top 25 research institutes in the U.S. in terms of research output, innovation, development, research expenditures, number of awarded patents, and awarded research grant proposals. ASU is measured not by who it excludes but by whom it includes.
  2. Organizational Chart Updated 10/20/2015
  3. Mostly through the Apache Software Foundation
  4. Transdisciplinary Common Ontological Representational Framework
  5. Hybridized Cloud model All elements (once siloed) now exist on a seamless fabric without need for complicated ETL mechanisms
  6. It should be noted that although we can sequence a human genome for $1000, this does not include any analysis of it.    There are 3 billion diploid basepairs, but 6 billion haploid sequences (because half come from mom and half from dad, and assembling those haplotypes - especially SNPs that are the same haplotype - is going to instrumental in future medical advances).   Other limitations: batch effects (in physical sequencing, in sequencing technology, in using different software, and even different versions of software). Batch effects can significantly impede variant discovery (false positives are high).
  7. Given that the detectable tumor burden is estimated to be approximately 109 tumor cells at the time of diagnosis, the level of resolution of conventional DNA-sequencing methods is clearly insufficient to assess pre-existing rare subclones that may harbor resistant mutations before therapy.
  8. Given that the detectable tumor burden is estimated to be approximately 109 tumor cells at the time of diagnosis, the level of resolution of conventional DNA-sequencing methods is clearly insufficient to assess pre-existing rare subclones that may harbor resistant mutations before therapy.
  9. Pressure Testing with HPC, MRv1 and Apache Spark
  10. In this a scenario, treatment usually removes the dominant sub-clones, shifting the evolutionary landscape in favor of one or more of the rare sub-clones, and allowing these treatment-resistant clones to thrive.
  11. The frequency of clonal mutations has been examined comprehensively for most cancer types, whereas the extent of subclonal heterogeneity within the DNA-sequences of individual tumors has not. Greater subclonal diversity in a tumor may predict a higher likelihood of pre-existing resistance to any conceivable targeted therapy.