SlideShare a Scribd company logo
1 of 32
Download to read offline
Lukas Habegger, Associate Director Bioinformatics
Regeneron Genetics Center (RGC)
Insights from Building the
Future of Drug Discovery with
Apache Spark
#EntSAIS14
Outline
• Current state of drug discovery and development
• Benefits of leveraging human genetics data
• Overview of the Regeneron Genetics Center (RGC)
• Challenges on the road to delivering on the promises of big data and genomics in drug discovery
• Overview of how the RGC leverages Databricks’ Unified Analytics Platform and Apache Spark
• Discussion of key engineering innovations
• Conclusions & lessons learned
2#EntSAIS14
Current state of drug discovery and development:
Maximizing chances of success with human genetics
3
95% of experimental
medicines fail in
development; costs
exceed $2B per
approved drug
Higher probability
for success for
drugs with strong
human genetics
evidence
>$100B spent on
worldwide R&D by
biopharma industry à
only 10–20 new drugs
per year
Target bottleneck: <1,000
genes (<5% of all genes)
account for targets of all
drugs currently in
development
Herper M. Forbes.com. The Truly Staggering Cost of Inventing New Drugs. https://www.forbes.com/sites/matthewherper/2012/02/10/the-truly-staggering-cost-of-inventing-new-drugs/#355471a54a94. Feb. 10, 2012.
Herper M. Forbes.com. How the Staggering Cost of Inventing New Drugs Is Shaping the Future of Medicine. https://www.forbes.com/sites/matthewherper/2013/08/11/how-the-staggering-cost-of-inventing-new-drugs-is-shaping-the-future-of-medicine/#30f1a95113c3. Aug. 11, 2013.
Booth B. Forbes.com. A Billion Here, A Billion There: The Cost of Making a Drug Revisited. https://www.forbes.com/sites/brucebooth/2014/11/21/a-billion-here-a-billion-there-the-cost-of-making-a-drug-revisited/#6034e7f226a8. Nov. 21, 2014.
Nat Genet. 2015 Aug;47(8):856-60. doi: 10.1038/ng.3314. Nat Rev Drug Discov. 2013 Aug;12(8):581-94. doi: 10.1038/nrd4051. Nat Rev Drug Discov. 2017 Jan;16(1):19-34. doi: 10.1038/nrd.2016.230.
You cannot pursue modern drug discovery and development without incorporating human genetics
Why is human genetics such a powerful tool for drug
discovery?
4
Neutral
DNA
mutation
Loss-of-function
Impact on
disease
Impact on
gene product
Gain-of-function
NeutralProtective Damaging
Example: A à T
Why is human genetics such a powerful tool for drug
discovery?
5
Neutral
DNA
mutation
Loss-of-function
Impact on
disease
Impact on
gene product
Gain-of-function
NeutralProtective Damaging
Example: A à T
Why is human genetics such a powerful tool for drug
discovery?
6
Neutral
DNA
mutation
Loss-of-function
Impact on
disease
Impact on
gene product
Gain-of-function
NeutralProtective Damaging
Example: A à T
Why is human genetics such a powerful tool for drug
discovery?
7
Neutral
DNA
mutation
Loss-of-function
Drug
Impact on
disease
Impact on
gene product
Gain-of-function
NeutralProtective Damaging
Example: A à T
PCSK9: A success story where human genetics
evidence played a key role in drug development
8
Neutral
DNA
mutation
Loss-of-function
Drug
Impact on
disease
Impact on
gene product
Gain-of-function
NeutralProtective Damaging
• Loss-of-function
mutations in PCSK9
protect against heart
disease
• Regeneron developed
a drug to block PCSK9,
which has shown to be
effective in preventing
heart disease
Example: A à T
The goal of the RGC is build one of the world’s largest
genotype-phenotype resources
• Regeneron has a long history of commitment to genetics-based science, and a track record of
integrating human genetics into development programs, delivering new medicines to patients
• Regeneron established the Regeneron Genetics Center (RGC) in 2014
• Goal: build one of the world’s most comprehensive genetics databases to supplement our state-
of-the-art drug development pipeline
• To date, the RGC has sequenced DNA from more than 300,000 individuals
9#EntSAIS14
Breadth of human genetics resources: RGC network of
60+ collaborators representing over 1 million samples
10#EntSAIS14
Founder populations
Phenotype specific cohorts
Family studies
General population
Breadth of human genetics resources: RGC network of
60+ collaborators representing over 1 million samples
11#EntSAIS14
Founder populations
Phenotype specific cohorts
Family studies
General population
RGC collaboration with UK Biobank: RGC will sequence
~500K participants over 3-5 years
12#EntSAIS14
®
Automation is key to enable large-scale data production
and analysis
13#EntSAIS14
Automated biobank
(1.4M samples)
Library preparation
(>300,000 samples / year)
Sequencing instruments
(>300,000 samples / year)
100% cloud-based
informatics & analysis
®
A scalable informatics platform is needed to analyze this data and make it accessible to a broad set of users
How do we analyze our data to gain novel insights?
Approach and desired goal
14#EntSAIS14
• Approach:
1. Sequence a large number of individuals to
identify their mutations
2. Obtain paired clinical data (traits derived from
de-identified electronic medical records)
3. Test for correlations/associations between all
mutations and traits
4. Mine association results in various ways to
gain insights
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
Desired goal
How do we analyze our data to gain novel insights?
It’s more complicated – lack of data unification
15#EntSAIS14
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
Desired goalReality
MM
Individuals
Mutations
TM
Traits
Individuals
txt txtpVCF
AR
ResultsFiles
Mutation : Trait
• Data is decentralized and stored in different
formats
• Data is organized in different ways (e.g., not
squared off, transposed, custom
representations and indexing schemes)
• Asking simple questions requires many time-
consuming data wrangling steps
txt
How do we analyze our data to gain novel insights?
It’s more complicated – data from multiple cohorts
16#EntSAIS14
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
Desired goalReality
GT
Individuals
Mutations
TM
Traits
Individuals
txt
ResultsFiles
Mutation : Trait
• The RGC has data from multiple collaborators
• Data is not always consistent
• Limited functionality to unify / aggregate
matrices from multiple cohorts
GT
TM
MM
TM
AR
pVCF txt txt
How do we analyze our data to gain novel insights?
It’s more complicated – scalability issues
17#EntSAIS14
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
Desired goalReality
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine10s of millions
100s of billions
10s of thousands
• Large inputs
(MM & TM)
• MM x TM
cross join
• Massive
outputs (AR)
How do we find out what these mutations do?
The Databricks solution
18#EntSAIS14
• RGC has established a major partnership with
Databricks in 2017
• RGC is leveraging the Databricks Unified Analytics
Platform to create a unified data & compute
infrastructure:
1. Developed efficient and unified data
representations
2. Implemented scalable production workflows
optimized for analyzing billions of rows
3. Created a unified codebase to enable all
levels of users to perform computation
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
The RGC has developed easy-to-use web applications
to make the data accessible to a broad set of users
19#EntSAIS14
Web
Application
Databricks
Cluster
Query
Results
Queries
Library
Architecture of RGC web applications
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
Goal: to enable everyone in the drug development process to
easily access, analyze, and extract insights from the RGC’s data
The RGC Results Browser enables users to query
billions of association results
• Goal: Efficiently search billions of association
results across multiple cohorts
• The data set is updated when association results
from a new cohort become available
• Size of the current data set: >67 billion association
results (>200 billion results for the next update)
20#EntSAIS14
AR
Optimizations to the ETL workflow have significantly
reduced the time to ingest the association results
• Association results are ingested and merged
from multiple cohorts
• Spark-based solution scales linearly with
cluster size
– Several optimizations have made the
process more efficient
– Migration of other QC processes into
this workflow enable an end-to-end
Spark solution
21#EntSAIS14
Optimizing the partitioning scheme has significantly
reduced the query response time
• The input data is naturally organized by cohort; not query optimized
22#EntSAIS14
AR
Chromosomal Location
Gene
density
Results
density AR
Chromosomal boundaries
Partition
density
Variable range width & count
Range
Partitioned
• Optimizations reduced the query response time from >30 minutes to <3 seconds
Demo notebook: mining association results and
extracting key insights
23#EntSAIS14
The RGC has recently identified a new potential drug
target for treating liver disease
24#EntSAIS14
Source: https://endpts.com/the-pcsk9-of-nash-regeneron-and-alnylam-join-forces-to-tackle-a-promising-target-for-severe-liver-diseases/
Liver disease can be detected based on enzyme levels
in the blood
• Two enzymes are typically analyzed to evaluate liver damage:
– AST (Aspartate transaminase)
– ALT (Alanine transaminase)
• Elevated levels of AST and ALT are indicative of liver damage
– Necessary but not sufficient
• Goal: identify loss-of-function mutations that are associated with lower AST and ALT levels
(protective effect)
25#EntSAIS14
Manhattan plot for AST: Several mutations in the
genome are associated with this liver trait
26#EntSAIS14
What peak / mutation is the
most interesting?
Manhattan plot for AST: Several mutations in the
genome are associated with this liver trait
27#EntSAIS14
What peak / mutation is the
most interesting?
HSD17B13
28#EntSAIS14
• The mutation of interest is associated with a broad spectrum of liver disease traits
• All of these associations confer protection from liver disease
29#EntSAIS14
Conclusions & lessons learned
• At Regeneron our goal is to bring the power of science to medicine and develop new medicines for
patients in need
• Incorporating human genetics evidence is critical for pursuing modern drug discovery; the RGC is
building one of the world’s largest genetics databases to identify new potential drug targets
• Our strategic partnership with Databricks has enabled us to build a state-of-the-art data science
platform from scratch by:
– Developing efficient and unified data representations
– Building out scalable workflows to mine billions of rows and addressing key bottlenecks (e.g.,
reducing the ETL time from weeks to hours and optimizing the query response time to <3s)
– Creating a unified codebase to enable all levels of users to perform computation
• Most importantly, the Databricks Unified Analytics Platform, brings our data, tools, and people together
to accelerate innovation
30#EntSAIS14
Acknowledgements
31#EntSAIS14
• RGC-LT
– Alan Shuldiner
– Aris Baras
– Aris Economides
– Jeffrey Reid
– John Overton
• RGC-GI
– Alicia Hawes
– Ashish Yadav
– Claire Chai
– Evan Maxwell
– Gisu Eom
– Jeff Staples
– John Penn
– Leland Barnard
– Shareef Khalid
– Sheldon Bai
– Suganthi Balasubramanian
– Young Hahn
• RGC
– Alexander Li
– Alexander Lopez
– Amy Damask
– Charlie Paulding
– Claudia Schurmann
– Colm O’Dushlaine
– Cristopher Van Hout
– Dylan Sun
– Jan Freudenberg
– Kavita Praveen
– Kia Manoochehri
– Lauren Gurski
– Manasi Pradhan
– Mike Norsen
– Nehal Gosalia
– Nila Banerjee
– Rick Ulloa
– Shane McCarthy
– Tanya Teslovich Dostal
– Tony Marcketta
• Databricks
– Ali Ghodsi
– Ali Hodroj
– Allan Marcos
– Ambareesh Kulkarni
– Bavesh Patel
– Christopher Hoshino-Fish
– David Weaver
– Francis Gerace
– Hossein Falaki
– Ion Stocia
– Juliusz Sompolsk
– Li Yu
– Navid Bazzazzadeh
– Paris Georgallis
– Ram Sriharsha
– Ronak Shah
– Shiva Bhattacharjee
– Vida Ha
– Yongsheng Huang
• REGN-IT
– Abdul Shaik
– Allen Chiang
– Brandon Fetch
– Christopher McCabe
– Dale Cochran
– David Glosser
– Long Le
– Michael Phillips
– Mohammad Saeed
– Pat Leblanc
– Sal Mineo
– Shaw Nawaz
– Shiva Ravi
– Stephen Huvane
– Vin Dahake
– Weylin Preodor
Questions?
32#EntSAIS14
https://tinyurl.com/yaqwl2bt
We are hiring!

More Related Content

What's hot

Precision Medicine - The Future of Healthcare
Precision Medicine - The Future of HealthcarePrecision Medicine - The Future of Healthcare
Precision Medicine - The Future of HealthcareData Science Thailand
 
Types of clinical studies
Types of clinical studiesTypes of clinical studies
Types of clinical studiesSamir Haffar
 
20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06Computer Science Club
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data WorldCloudera, Inc.
 
Methods of Randomization
Methods of RandomizationMethods of Randomization
Methods of RandomizationAmy Mehaboob
 
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...PhinC Development
 
First in human dose - clinical trial designs.pptx
First in human dose - clinical trial designs.pptxFirst in human dose - clinical trial designs.pptx
First in human dose - clinical trial designs.pptxDr. Nipa Mendapara
 
Precision Medicine- Growth Opportunities for Genomics Technologies
Precision Medicine- Growth Opportunities for Genomics TechnologiesPrecision Medicine- Growth Opportunities for Genomics Technologies
Precision Medicine- Growth Opportunities for Genomics TechnologiesWilliam Baird
 
DNA Methylation Data Analysis
DNA Methylation Data AnalysisDNA Methylation Data Analysis
DNA Methylation Data AnalysisYi-Feng Chang
 
MT115 Precision Medicine: Integrating genomics to enable better patient outcomes
MT115 Precision Medicine: Integrating genomics to enable better patient outcomesMT115 Precision Medicine: Integrating genomics to enable better patient outcomes
MT115 Precision Medicine: Integrating genomics to enable better patient outcomesDell EMC World
 
Precision Medicine: Four Trends Make It Possible
Precision Medicine: Four Trends Make It PossiblePrecision Medicine: Four Trends Make It Possible
Precision Medicine: Four Trends Make It PossibleHealth Catalyst
 
How to Conduct a Systematic Search
How to Conduct a Systematic SearchHow to Conduct a Systematic Search
How to Conduct a Systematic SearchRobin Featherstone
 
Drug Development Life Cycle - Costs and Revenue
Drug Development Life Cycle - Costs and RevenueDrug Development Life Cycle - Costs and Revenue
Drug Development Life Cycle - Costs and RevenueRobert Sturm
 
The story of personalized medicine
The story of personalized medicineThe story of personalized medicine
The story of personalized medicineSeth Taylor
 
Adaptive Clinical Trials - Presentation PHA661 - Submitted by Esther Cho
Adaptive Clinical Trials - Presentation PHA661 - Submitted by Esther ChoAdaptive Clinical Trials - Presentation PHA661 - Submitted by Esther Cho
Adaptive Clinical Trials - Presentation PHA661 - Submitted by Esther ChoEsther Cho, PMP
 
Regulatory agencies
Regulatory agenciesRegulatory agencies
Regulatory agenciesUrmila Aswar
 
Paediatric drugs, its dose and dosage forms
Paediatric drugs, its dose and dosage formsPaediatric drugs, its dose and dosage forms
Paediatric drugs, its dose and dosage formsAiswarya Thomas
 

What's hot (20)

Precision Medicine - The Future of Healthcare
Precision Medicine - The Future of HealthcarePrecision Medicine - The Future of Healthcare
Precision Medicine - The Future of Healthcare
 
Types of clinical studies
Types of clinical studiesTypes of clinical studies
Types of clinical studies
 
20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06
 
Biomarkers
BiomarkersBiomarkers
Biomarkers
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data World
 
Methods of Randomization
Methods of RandomizationMethods of Randomization
Methods of Randomization
 
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
 
First in human dose - clinical trial designs.pptx
First in human dose - clinical trial designs.pptxFirst in human dose - clinical trial designs.pptx
First in human dose - clinical trial designs.pptx
 
Precision Medicine- Growth Opportunities for Genomics Technologies
Precision Medicine- Growth Opportunities for Genomics TechnologiesPrecision Medicine- Growth Opportunities for Genomics Technologies
Precision Medicine- Growth Opportunities for Genomics Technologies
 
DNA Methylation Data Analysis
DNA Methylation Data AnalysisDNA Methylation Data Analysis
DNA Methylation Data Analysis
 
MT115 Precision Medicine: Integrating genomics to enable better patient outcomes
MT115 Precision Medicine: Integrating genomics to enable better patient outcomesMT115 Precision Medicine: Integrating genomics to enable better patient outcomes
MT115 Precision Medicine: Integrating genomics to enable better patient outcomes
 
Precision Medicine: Four Trends Make It Possible
Precision Medicine: Four Trends Make It PossiblePrecision Medicine: Four Trends Make It Possible
Precision Medicine: Four Trends Make It Possible
 
How to Conduct a Systematic Search
How to Conduct a Systematic SearchHow to Conduct a Systematic Search
How to Conduct a Systematic Search
 
Drug Development Life Cycle - Costs and Revenue
Drug Development Life Cycle - Costs and RevenueDrug Development Life Cycle - Costs and Revenue
Drug Development Life Cycle - Costs and Revenue
 
The story of personalized medicine
The story of personalized medicineThe story of personalized medicine
The story of personalized medicine
 
Adaptive Clinical Trials - Presentation PHA661 - Submitted by Esther Cho
Adaptive Clinical Trials - Presentation PHA661 - Submitted by Esther ChoAdaptive Clinical Trials - Presentation PHA661 - Submitted by Esther Cho
Adaptive Clinical Trials - Presentation PHA661 - Submitted by Esther Cho
 
What is Regulatory Medical Writing?
What is Regulatory Medical Writing?What is Regulatory Medical Writing?
What is Regulatory Medical Writing?
 
Regulatory agencies
Regulatory agenciesRegulatory agencies
Regulatory agencies
 
Medico Marketing Writing
Medico Marketing WritingMedico Marketing Writing
Medico Marketing Writing
 
Paediatric drugs, its dose and dosage forms
Paediatric drugs, its dose and dosage formsPaediatric drugs, its dose and dosage forms
Paediatric drugs, its dose and dosage forms
 

Similar to Insights from Building the Future of Drug Discovery with Apache Spark with Lukas Habegger

2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI ConferenceMegan Sawchuk
 
Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug HuntingEd Griffen
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00Pistoia Alliance
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?Al Dossetter
 
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEmerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEd Griffen
 
RNA-Seq Boston (23-25 June 2015) Agenda
RNA-Seq Boston (23-25 June 2015) AgendaRNA-Seq Boston (23-25 June 2015) Agenda
RNA-Seq Boston (23-25 June 2015) AgendaDiane McKenna
 
5th RNA-Seq San Francisco Agenda
5th RNA-Seq San Francisco Agenda5th RNA-Seq San Francisco Agenda
5th RNA-Seq San Francisco AgendaDiane McKenna
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...Christopher Hart
 
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug TargetsDiscovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug TargetsJaime Hodges
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcarePerficient, Inc.
 
SMi Group's 14th annual Drug Design 2015 conference
SMi Group's 14th annual Drug Design 2015 conferenceSMi Group's 14th annual Drug Design 2015 conference
SMi Group's 14th annual Drug Design 2015 conferenceDale Butler
 
Early Metabolite (MetID) Info Sheet
Early Metabolite (MetID) Info SheetEarly Metabolite (MetID) Info Sheet
Early Metabolite (MetID) Info SheetCovance
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeWarren Kibbe
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
RNA-Seq 2013 Brochure
RNA-Seq 2013 BrochureRNA-Seq 2013 Brochure
RNA-Seq 2013 BrochureDiane McKenna
 

Similar to Insights from Building the Future of Drug Discovery with Apache Spark with Lukas Habegger (20)

2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEmerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
 
RNA-Seq Boston (23-25 June 2015) Agenda
RNA-Seq Boston (23-25 June 2015) AgendaRNA-Seq Boston (23-25 June 2015) Agenda
RNA-Seq Boston (23-25 June 2015) Agenda
 
2016 LabHIT Vision
2016 LabHIT Vision2016 LabHIT Vision
2016 LabHIT Vision
 
5th RNA-Seq San Francisco Agenda
5th RNA-Seq San Francisco Agenda5th RNA-Seq San Francisco Agenda
5th RNA-Seq San Francisco Agenda
 
MDC Connects: Make the Molecules that Matter
MDC Connects: Make the Molecules that MatterMDC Connects: Make the Molecules that Matter
MDC Connects: Make the Molecules that Matter
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug TargetsDiscovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in Healthcare
 
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
 
SMi Group's 14th annual Drug Design 2015 conference
SMi Group's 14th annual Drug Design 2015 conferenceSMi Group's 14th annual Drug Design 2015 conference
SMi Group's 14th annual Drug Design 2015 conference
 
Early Metabolite (MetID) Info Sheet
Early Metabolite (MetID) Info SheetEarly Metabolite (MetID) Info Sheet
Early Metabolite (MetID) Info Sheet
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbe
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
RNA-Seq 2013 Brochure
RNA-Seq 2013 BrochureRNA-Seq 2013 Brochure
RNA-Seq 2013 Brochure
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...HyderabadDolls
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 

Recently uploaded (20)

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 

Insights from Building the Future of Drug Discovery with Apache Spark with Lukas Habegger

  • 1. Lukas Habegger, Associate Director Bioinformatics Regeneron Genetics Center (RGC) Insights from Building the Future of Drug Discovery with Apache Spark #EntSAIS14
  • 2. Outline • Current state of drug discovery and development • Benefits of leveraging human genetics data • Overview of the Regeneron Genetics Center (RGC) • Challenges on the road to delivering on the promises of big data and genomics in drug discovery • Overview of how the RGC leverages Databricks’ Unified Analytics Platform and Apache Spark • Discussion of key engineering innovations • Conclusions & lessons learned 2#EntSAIS14
  • 3. Current state of drug discovery and development: Maximizing chances of success with human genetics 3 95% of experimental medicines fail in development; costs exceed $2B per approved drug Higher probability for success for drugs with strong human genetics evidence >$100B spent on worldwide R&D by biopharma industry à only 10–20 new drugs per year Target bottleneck: <1,000 genes (<5% of all genes) account for targets of all drugs currently in development Herper M. Forbes.com. The Truly Staggering Cost of Inventing New Drugs. https://www.forbes.com/sites/matthewherper/2012/02/10/the-truly-staggering-cost-of-inventing-new-drugs/#355471a54a94. Feb. 10, 2012. Herper M. Forbes.com. How the Staggering Cost of Inventing New Drugs Is Shaping the Future of Medicine. https://www.forbes.com/sites/matthewherper/2013/08/11/how-the-staggering-cost-of-inventing-new-drugs-is-shaping-the-future-of-medicine/#30f1a95113c3. Aug. 11, 2013. Booth B. Forbes.com. A Billion Here, A Billion There: The Cost of Making a Drug Revisited. https://www.forbes.com/sites/brucebooth/2014/11/21/a-billion-here-a-billion-there-the-cost-of-making-a-drug-revisited/#6034e7f226a8. Nov. 21, 2014. Nat Genet. 2015 Aug;47(8):856-60. doi: 10.1038/ng.3314. Nat Rev Drug Discov. 2013 Aug;12(8):581-94. doi: 10.1038/nrd4051. Nat Rev Drug Discov. 2017 Jan;16(1):19-34. doi: 10.1038/nrd.2016.230. You cannot pursue modern drug discovery and development without incorporating human genetics
  • 4. Why is human genetics such a powerful tool for drug discovery? 4 Neutral DNA mutation Loss-of-function Impact on disease Impact on gene product Gain-of-function NeutralProtective Damaging Example: A à T
  • 5. Why is human genetics such a powerful tool for drug discovery? 5 Neutral DNA mutation Loss-of-function Impact on disease Impact on gene product Gain-of-function NeutralProtective Damaging Example: A à T
  • 6. Why is human genetics such a powerful tool for drug discovery? 6 Neutral DNA mutation Loss-of-function Impact on disease Impact on gene product Gain-of-function NeutralProtective Damaging Example: A à T
  • 7. Why is human genetics such a powerful tool for drug discovery? 7 Neutral DNA mutation Loss-of-function Drug Impact on disease Impact on gene product Gain-of-function NeutralProtective Damaging Example: A à T
  • 8. PCSK9: A success story where human genetics evidence played a key role in drug development 8 Neutral DNA mutation Loss-of-function Drug Impact on disease Impact on gene product Gain-of-function NeutralProtective Damaging • Loss-of-function mutations in PCSK9 protect against heart disease • Regeneron developed a drug to block PCSK9, which has shown to be effective in preventing heart disease Example: A à T
  • 9. The goal of the RGC is build one of the world’s largest genotype-phenotype resources • Regeneron has a long history of commitment to genetics-based science, and a track record of integrating human genetics into development programs, delivering new medicines to patients • Regeneron established the Regeneron Genetics Center (RGC) in 2014 • Goal: build one of the world’s most comprehensive genetics databases to supplement our state- of-the-art drug development pipeline • To date, the RGC has sequenced DNA from more than 300,000 individuals 9#EntSAIS14
  • 10. Breadth of human genetics resources: RGC network of 60+ collaborators representing over 1 million samples 10#EntSAIS14 Founder populations Phenotype specific cohorts Family studies General population
  • 11. Breadth of human genetics resources: RGC network of 60+ collaborators representing over 1 million samples 11#EntSAIS14 Founder populations Phenotype specific cohorts Family studies General population
  • 12. RGC collaboration with UK Biobank: RGC will sequence ~500K participants over 3-5 years 12#EntSAIS14 ®
  • 13. Automation is key to enable large-scale data production and analysis 13#EntSAIS14 Automated biobank (1.4M samples) Library preparation (>300,000 samples / year) Sequencing instruments (>300,000 samples / year) 100% cloud-based informatics & analysis ® A scalable informatics platform is needed to analyze this data and make it accessible to a broad set of users
  • 14. How do we analyze our data to gain novel insights? Approach and desired goal 14#EntSAIS14 • Approach: 1. Sequence a large number of individuals to identify their mutations 2. Obtain paired clinical data (traits derived from de-identified electronic medical records) 3. Test for correlations/associations between all mutations and traits 4. Mine association results in various ways to gain insights MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix Desired goal
  • 15. How do we analyze our data to gain novel insights? It’s more complicated – lack of data unification 15#EntSAIS14 MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix Desired goalReality MM Individuals Mutations TM Traits Individuals txt txtpVCF AR ResultsFiles Mutation : Trait • Data is decentralized and stored in different formats • Data is organized in different ways (e.g., not squared off, transposed, custom representations and indexing schemes) • Asking simple questions requires many time- consuming data wrangling steps txt
  • 16. How do we analyze our data to gain novel insights? It’s more complicated – data from multiple cohorts 16#EntSAIS14 MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix Desired goalReality GT Individuals Mutations TM Traits Individuals txt ResultsFiles Mutation : Trait • The RGC has data from multiple collaborators • Data is not always consistent • Limited functionality to unify / aggregate matrices from multiple cohorts GT TM MM TM AR pVCF txt txt
  • 17. How do we analyze our data to gain novel insights? It’s more complicated – scalability issues 17#EntSAIS14 MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix Desired goalReality MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine10s of millions 100s of billions 10s of thousands • Large inputs (MM & TM) • MM x TM cross join • Massive outputs (AR)
  • 18. How do we find out what these mutations do? The Databricks solution 18#EntSAIS14 • RGC has established a major partnership with Databricks in 2017 • RGC is leveraging the Databricks Unified Analytics Platform to create a unified data & compute infrastructure: 1. Developed efficient and unified data representations 2. Implemented scalable production workflows optimized for analyzing billions of rows 3. Created a unified codebase to enable all levels of users to perform computation MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix
  • 19. The RGC has developed easy-to-use web applications to make the data accessible to a broad set of users 19#EntSAIS14 Web Application Databricks Cluster Query Results Queries Library Architecture of RGC web applications MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix Goal: to enable everyone in the drug development process to easily access, analyze, and extract insights from the RGC’s data
  • 20. The RGC Results Browser enables users to query billions of association results • Goal: Efficiently search billions of association results across multiple cohorts • The data set is updated when association results from a new cohort become available • Size of the current data set: >67 billion association results (>200 billion results for the next update) 20#EntSAIS14 AR
  • 21. Optimizations to the ETL workflow have significantly reduced the time to ingest the association results • Association results are ingested and merged from multiple cohorts • Spark-based solution scales linearly with cluster size – Several optimizations have made the process more efficient – Migration of other QC processes into this workflow enable an end-to-end Spark solution 21#EntSAIS14
  • 22. Optimizing the partitioning scheme has significantly reduced the query response time • The input data is naturally organized by cohort; not query optimized 22#EntSAIS14 AR Chromosomal Location Gene density Results density AR Chromosomal boundaries Partition density Variable range width & count Range Partitioned • Optimizations reduced the query response time from >30 minutes to <3 seconds
  • 23. Demo notebook: mining association results and extracting key insights 23#EntSAIS14
  • 24. The RGC has recently identified a new potential drug target for treating liver disease 24#EntSAIS14 Source: https://endpts.com/the-pcsk9-of-nash-regeneron-and-alnylam-join-forces-to-tackle-a-promising-target-for-severe-liver-diseases/
  • 25. Liver disease can be detected based on enzyme levels in the blood • Two enzymes are typically analyzed to evaluate liver damage: – AST (Aspartate transaminase) – ALT (Alanine transaminase) • Elevated levels of AST and ALT are indicative of liver damage – Necessary but not sufficient • Goal: identify loss-of-function mutations that are associated with lower AST and ALT levels (protective effect) 25#EntSAIS14
  • 26. Manhattan plot for AST: Several mutations in the genome are associated with this liver trait 26#EntSAIS14 What peak / mutation is the most interesting?
  • 27. Manhattan plot for AST: Several mutations in the genome are associated with this liver trait 27#EntSAIS14 What peak / mutation is the most interesting? HSD17B13
  • 29. • The mutation of interest is associated with a broad spectrum of liver disease traits • All of these associations confer protection from liver disease 29#EntSAIS14
  • 30. Conclusions & lessons learned • At Regeneron our goal is to bring the power of science to medicine and develop new medicines for patients in need • Incorporating human genetics evidence is critical for pursuing modern drug discovery; the RGC is building one of the world’s largest genetics databases to identify new potential drug targets • Our strategic partnership with Databricks has enabled us to build a state-of-the-art data science platform from scratch by: – Developing efficient and unified data representations – Building out scalable workflows to mine billions of rows and addressing key bottlenecks (e.g., reducing the ETL time from weeks to hours and optimizing the query response time to <3s) – Creating a unified codebase to enable all levels of users to perform computation • Most importantly, the Databricks Unified Analytics Platform, brings our data, tools, and people together to accelerate innovation 30#EntSAIS14
  • 31. Acknowledgements 31#EntSAIS14 • RGC-LT – Alan Shuldiner – Aris Baras – Aris Economides – Jeffrey Reid – John Overton • RGC-GI – Alicia Hawes – Ashish Yadav – Claire Chai – Evan Maxwell – Gisu Eom – Jeff Staples – John Penn – Leland Barnard – Shareef Khalid – Sheldon Bai – Suganthi Balasubramanian – Young Hahn • RGC – Alexander Li – Alexander Lopez – Amy Damask – Charlie Paulding – Claudia Schurmann – Colm O’Dushlaine – Cristopher Van Hout – Dylan Sun – Jan Freudenberg – Kavita Praveen – Kia Manoochehri – Lauren Gurski – Manasi Pradhan – Mike Norsen – Nehal Gosalia – Nila Banerjee – Rick Ulloa – Shane McCarthy – Tanya Teslovich Dostal – Tony Marcketta • Databricks – Ali Ghodsi – Ali Hodroj – Allan Marcos – Ambareesh Kulkarni – Bavesh Patel – Christopher Hoshino-Fish – David Weaver – Francis Gerace – Hossein Falaki – Ion Stocia – Juliusz Sompolsk – Li Yu – Navid Bazzazzadeh – Paris Georgallis – Ram Sriharsha – Ronak Shah – Shiva Bhattacharjee – Vida Ha – Yongsheng Huang • REGN-IT – Abdul Shaik – Allen Chiang – Brandon Fetch – Christopher McCabe – Dale Cochran – David Glosser – Long Le – Michael Phillips – Mohammad Saeed – Pat Leblanc – Sal Mineo – Shaw Nawaz – Shiva Ravi – Stephen Huvane – Vin Dahake – Weylin Preodor