11. Sequencing Data Processed / $
2018
Cost of processing all DNA sequenced in a year
is growing exponentially year-over-year!
Sequencing Data / $
12. Challenge #1: Complex Pipelines
Complex Genomic Pipelines
Costly and time consuming
Annotation
Alignment
Variant Calling
Quality Control
BWA
Analysis
Raw Data
13. Challenge #2: Rigid Analytics
Complex Genomic Pipelines
Costly and time consuming
Annotation
Alignment
Variant Calling
Quality Control
BWA
Analysis
Raw Data
Rigid Analytics
Reduced Scope of Research
14. Challenge #3: Siloed Teams
Complex Genomic Pipelines
Costly and time consuming
Annotation
Alignment
Variant Calling
Quality Control
BWA
Analysis
Raw Data
Rigid Analytics
Reduced Scope of Research
Siloed Teams
Lack of Productivity
Researchers
and Clinicians
Bioinformatics
Teams
Computational
Biologists
15. Solution #1: Prebuilt Pipelines
Complex Genomic Pipelines
Costly and time consuming
Annotation
Alignment
Variant Calling
Quality Control
BWA
Analysis
Raw Data
Rigid Analytics
Reduced Scope of Research
Siloed Teams
Lack of Productivity
Researchers
and Clinicians
Bioinformatics
Teams
Computational
Biologists
Raw Data
Analyses
Raw Data Raw Data
Packaged Workflows and Tools
powered by
Databricks Runtime
“One click” execution
Best Practice Pipelines
16. Solution #1: Prebuilt Pipelines
Complex Genomic Pipeline
Costly and time consuming
Annotation
Alignment
Variant Calling
Quality Control
BWA
Analysis
Raw Data
Rigid Analytics
Reduced Scope of Research
Siloed Teams
Lack of Productivity
Researchers
and Clinicians
Bioinformatics
Teams
Computational
Biologists
Raw Data
Analyses
Raw Data Raw Data
Packaged Workflows and Tools
powered by
Databricks Runtime
“One click” execution
Best Practice Pipelines
30x Coverage Whole Genome (GVCF)
0:30:00 1:00:00 1:30:00 2:00:00
Processing Time
3.8x faster than
industry leader
Edico
2:29:23
0:39:23
17. Solution #2: Powerful Analytics
Rigid Analytics
Reduced Scope of Research
Siloed Teams
Lack of Productivity
Researchers
and Clinicians
Bioinformatics
Teams
Computational
Biologists
From interactive queries to AI
Powerful Analytics
Raw Data
Analyses
Raw Data Raw Data
Packaged Workflows and Tools
powered by
Databricks Runtime
“One click” execution
Best Practice Pipelines
18. Solution #2: Powerful Analytics
Rigid Analytics
Reduced Scope of Research
Siloed Teams
Lack of Productivity
Researchers
and Clinicians
Bioinformatics
Teams
Computational
Biologists
From interactive queries to AI
Powerful Analytics
Raw Data
Analyses
Raw Data Raw Data
Packaged Workflows and Tools
powered by
Databricks Runtime
“One click” execution
Best Practice Pipelines
“Having the data is the first step,
enabling drug development teams
to answer questions with the data
is how we are building the future of
drug discovery.”
Dr. Jeff Reid, Exec Dir at Regeneron
“Queries on
60B+ genome
associations in
3 seconds vs.
30 minutes”
19. Siloed Teams
Lack of Productivity
Researchers
and Clinicians
Bioinformatics
Teams
Computational
Biologists
Solution #3: Collaborative Workspaces
Raw Data
Analyses
Raw Data Raw Data
Packaged Workflows and Tools
powered by
Databricks Runtime
“One click” execution
Best Practice Pipelines
From interactive queries to AI
Powerful Analytics
Lack of ProductivityDramatically Improve Productivity
Collaborative Workspaces
Researchers
and Clinicians
Bioinformatics
Teams
Computational
Biologists
20. Siloed Teams
Lack of Productivity
Researchers
and Clinicians
Bioinformatics
Teams
Computational
Biologists
Solution #3: Collaborative Workspaces
Raw Data
Analyses
Raw Data Raw Data
Packaged Workflows and Tools
powered by
Databricks Runtime
“One click” execution
Best Practice Pipelines
From interactive queries to AI
Powerful Analytics
Lack of ProductivityDramatically Improve Productivity
Collaborative Workspaces
Researchers
and Clinicians
Bioinformatics
Teams
Computational
Biologists
“Databricks allows us to take
clinical research and turn it into
a clinically validated screen in
far less time.”
Sr. Director of Computational
Bioinformatics, Lynn Carmichael
21. Unified Analytics Platform for Genomics
All Your
Genomic Data
Visualizations
Machine Learning
Best Practice
Pipelines
Tertiary
Analytics and
AI at Scale
Collaborative
Workspaces
Genomic Analytics
(e.g. GWAS, eQTL)
22. Unified Analytics Platform for Genomics
All Your
Genomic Data
Visualizations
Machine Learning
Best Practice
Pipelines
Tertiary
Analytics and
AI at Scale
Collaborative
Workspaces
Genomic Analytics
(e.g. GWAS, eQTL)
Genomics-specific optimizations
increase performance by up to 100x
27. Typical patient intake and treatment
Identify Diagnose Treat
...but this is very reactive and costly.
28. Typical patient intake and treatment
Identify Diagnose Treat
...but this is very reactive and costly.
By the age of 15, over 30% of Europeans
will develop a chronic disease
29. Let’s shift our thinking
What if we could identify
an individual’s risk for
developing a disease
and prevent that disease
before it ever occurs?
31. The preventative care process
Predict Prevent
Accelerated treatment improves outcomes
32. The preventative care process
Predict Prevent
Huge opportunity for genomics
Accelerated treatment improves outcomes
33. But genomic analysis is really hard
Population Scale Data
Arrives (e.g. Biobank)
Process
for Analysis
Export Model and
Apply to Individual
Generate Dashboard
for Clinician
34. Let’s try this with the
Databricks Unified Analytics
Platform for Genomics...