SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Managing R&D data on
parallel compute
infrastructure
Prepared for the 2021 Data + AI Summit
April 6, 2021
Boston +1 617 557 5800
© 2021 ZS 2
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
Topics
Introduction
NGS data persistence strategies
NGS mapping and alignment strategies
1
2
3
© 2021 ZS 3
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
ZS works closely with its clients to drive customer value and
create impact across the organization
ZSers who are
committed to helping
active clients and their
customers thrive
9,500+
190+
clients have experienced ZS
differentiation across 30 industries
in over 90 countries, including:
1,200+
80+ Therapy areas of
experience
100% Of the top 50 pharma are
our clients
90%+ Of our work in pharma
and medtech
ZS is a Premier Databricks partner with
a strong track record of enabling clients to
take full advantage of data by deploying
ZS’s proven assets and the Databricks
Unified Data Analytics Platform to serve
as a one-stop-shop for all users involved
in an end-to-end data engineering, data
analytics, and data science pipeline.
© 2021 ZS 4
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
The ZS R&D Excellence team partners with clinical, medical
and scientific clients to discover and develop innovative
medicines that improve patients’ lives
R&D areas of excellence
Our experts work side by side with clients, leveraging analytics and technology to create solutions that
work in the real world from R&D to commercialization.
Biomedical research
— Scientific solutions
— Bioinformatics and in-silico
solutions
— Scientific and research strategy
— Research and early development
technology platforms
— Integrated evidence strategy
— Real world data (RWD) strategy
— Observational research
— Rapid insight solutions
Real world evidence (RWE)
— RWE benchmarking
— Evidence communication
— Actionable RWE
— RWD science
Medical affairs
— Global evidence planning
— Medical org design
— Scientific communication strategy
— Medical science liaison design
and support
Global health economics
and outcomes research
— Economic modeling
— Value communication strategy
— Patient reported outcomes
— Literature review
Clinical development
— Trial optimization
— Quality risk monitoring
— Biometrics and clinical data tech
solutions
— Site and patient engagement
— Digital and virtual strategy
R&D
Excellence
About ZS R&D
750+ Professionals
focused on R&D
programs
60+ Million invested in
R&D data, analytics &
technology assets
Clinical Design Center
50+ Working with over 50
clients on R&D programs
© 2021 ZS 5
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
Problem statement
Current research and development
landscapes in biopharma have been plagued
by years of not following the FAIR principles
of data management. This has limited the
ability to fully democratize the use of this
data and stifled areas of drug development
and artificial intelligence-based medicine.
© 2021 ZS 6
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
Moving raw data to the cloud for analysis
LIMS/ELN Base calls ETL Read preparation analysis
Automation
— Instrument to cloud
– Metadata capture
– Resilient key
— Systems integration
– Sample
descriptions
Mature
Immature
Conversion to FASTQ
— Can be performed on
instrument
– Custom processes
— Automatically move
FASTQ or BCL to
cloud
Converting FASTQ to
dataset objects
— Defining common
models for individual
FASTQ reads
— Persisting these as
data products
Preprocessing
— Dataset parallelism
in executors
– Quality control
– Adapter and
synthetic ligations
trimmed
Preprocessed data
products
— Mapping to reference
data products
— Creation of specific
data products for a
pipeline
Manual
— No integration to
LIMS
— Metadata placed in
file names
Files to demultiplex
and convert on
instrument
— Sharing of files by
email or personal
share drives
No ETL
— All data remains as
compressed files
Preprocessing
— Run on a single node
– Limited parallelism
– Significantly long
times
– Secondary raw
data artifacts
Run against raw
reference/genomic
features
— Mapped to entire
reference genome
— Significant number of
useless artifacts
generated
© 2021 ZS 7
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final © 2021 ZS 7
Spark strategies for raw data ingestion
Spark and Databricks in
raw data ingestion
— Scalable clusters that allow for magnitudes of time enhancements
— Object-oriented nature of Spark Datasets allows for specific versions of
FASTQ headers
— Spark Structured Streaming and stepwise analytics
— Oxford Nanopore and FAST5 sequencing pipelines
Ingesting raw FASTQ to a dataset
Datasets are parallelizable and fit multiple platforms
Streaming analytics
From raw FASTQ to a
structured dataset persisted
to data lake
© 2021 ZS 8
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
Notebooks and user interaction
Jobs API
Databricks as a platform for analysis
Data scientists, process developers and
statisticians can interact with data products
— Controlling data quality through definitions of
external tables
— Ad hoc analytics with creation of silver and
bronze level data products
— Benchmarking, performance
enhancements/experimental execution
Highly controlled processes can be created as
traditional Spark application artifacts (.jar, .whl,
.egg, etc.)
— Data ingestion from raw sources
— Demultiplexing
— Controlled analytics pipelines
Notebooks can also be the sources and
definitions of spark execution jobs
— Concordance testing
— Visualizations and human-facing components
© 2021 ZS 9
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
Replacing
traditional
methods for
creating scalable
and high
throughput
time-sensitive
pipelines
Step-wise execution from
datasets
Creating data products
that have matchable
entities to direct
sequence products
BWA, Bowtie, single
node alignment
methods
Implementation of
matching and
identification methods
to Spark UDFs
Mapping throughput
Low High
Low
High
Aggregation
and
analysis
© 2021 ZS 10
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final © 2021 ZS 10
Use case for clinical pipeline
Liquid biopsy NGS
analysis
Amplicon-based pipeline
— Anticipated onboarding tens of
millions of patients
— Need analysis time per patient
to be in the realm of 1-2
LIMS System integrated directly into the file transfer to cloud storage (ADLSv2, S3)
— Metadata and sample information tracked by a resilient key
— Databricks Delta Lake implementation for data products
Approximately 500,000 known sequence features were analyzed
— Data products generated directly from reference sources
— Products have reference, provenance and versioning metadata
— Mapping strategy that implements a UDF to determine sequence matches
Able to cut the mapping and amplicon identification process from four hours to less than
one minute per patient
— Demultiplex flow-cell files and spin up one Databricks cluster per patient
— Stepwise job definitions allow joining executors at certain points in analysis
LIMS — ELN — Data transfer strategy
Amplicon-based analysis
Post-Mapping and scientific analysis
© 2021 ZS 11
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
Use case for clinical pipeline
R&D data lake
— Co-located data
— Archival cloud
storage
— HA/DR
Technical
value
Scientific
value
QC and prep reads
— Able to reuse
methods that are
open source
— Databricks and other
vendors creating
utilities in Spark
Parallelization
— Magnitudes of
scalability
— Data no longer
tracked in databases
and
Biologic relevance of
data
— Use of scientific-
friendly languages
– Python
– R
— Interface with bins
Enhance the Ability
to gain data value
— Data lakes
— Structured data
catalogs
Consolidated lake
— Democratization of
data
— FAIR principals
Amplification
Strategies
— Quality assurance-
based methods for
amplification
— Unique parent
barcoding
Novel Methods
— Lower barrier to
writing novel
methods to support
novel science
Scalable application
of powerful open-
source technologies
— Bioconductor
— SparkR
Growing structured and
consolidated lake
— Variational
autoencoders
— Machine learning (ML)
models
Raw data ingest Trim adapters Identify and map Post hoc analysis Train ML for AI
© 2021 ZS 12
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
Topics
Introduction
NGS data persistence strategies
NGS mapping and alignment strategies
© 2021 ZS 13
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
5847_Databricks_Data__AI_Summit_Talk_v4.1_Final
Contact Info
Andrew S. Brown
Ph.D.
Strategy & Architecture Manager
https://www.linkedin.com/in/andrew-brown-73917014b/
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
How OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman MaryaHow OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman MaryaData Con LA
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Introducing MLflow for End-to-End Machine Learning on Databricks
Introducing MLflow for End-to-End Machine Learning on DatabricksIntroducing MLflow for End-to-End Machine Learning on Databricks
Introducing MLflow for End-to-End Machine Learning on DatabricksDatabricks
 
RWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseRWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseDatabricks
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...DataWorks Summit
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETLLily Luo
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Big Data Fabric for At-Scale Real-Time Analysis by Edwin RobbinsData Con LA
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphCambridge Semantics
 
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep LearningRisk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep LearningCambridge Semantics
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Itai Yaffe
 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceDatabricks
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsSnapLogic
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientistsdatamantra
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data ArchitectureSplunk
 

Was ist angesagt? (20)

What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
How OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman MaryaHow OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman Marya
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Introducing MLflow for End-to-End Machine Learning on Databricks
Introducing MLflow for End-to-End Machine Learning on DatabricksIntroducing MLflow for End-to-End Machine Learning on Databricks
Introducing MLflow for End-to-End Machine Learning on Databricks
 
RWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseRWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use Case
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep LearningRisk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field Experience
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
 

Ähnlich wie Managing R&D Data on Parallel Compute Infrastructure

Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesCambridge Semantics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)Matt Barnes
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Delivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data FabricDelivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data FabricDenodo
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Trends und Anwendungsbeispiele im Life Science Bereich
Trends und Anwendungsbeispiele im Life Science BereichTrends und Anwendungsbeispiele im Life Science Bereich
Trends und Anwendungsbeispiele im Life Science BereichAWS Germany
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?Xpand IT
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsProvectus
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingDenodo
 
CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?Health Catalyst
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
Namitha_Rajashekar_ Final
Namitha_Rajashekar_ FinalNamitha_Rajashekar_ Final
Namitha_Rajashekar_ FinalNamitha Raj
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo
 
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Aridhia Informatics Ltd
 
Activate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge GraphActivate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge GraphDATAVERSITY
 

Ähnlich wie Managing R&D Data on Parallel Compute Infrastructure (20)

Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Delivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data FabricDelivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data Fabric
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Trends und Anwendungsbeispiele im Life Science Bereich
Trends und Anwendungsbeispiele im Life Science BereichTrends und Anwendungsbeispiele im Life Science Bereich
Trends und Anwendungsbeispiele im Life Science Bereich
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare Organizations
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
 
CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
Namitha_Rajashekar_ Final
Namitha_Rajashekar_ FinalNamitha_Rajashekar_ Final
Namitha_Rajashekar_ Final
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
 
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
 
Activate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge GraphActivate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge Graph
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 

Kürzlich hochgeladen

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

Managing R&D Data on Parallel Compute Infrastructure

  • 1. Managing R&D data on parallel compute infrastructure Prepared for the 2021 Data + AI Summit April 6, 2021 Boston +1 617 557 5800
  • 2. © 2021 ZS 2 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final Topics Introduction NGS data persistence strategies NGS mapping and alignment strategies 1 2 3
  • 3. © 2021 ZS 3 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final ZS works closely with its clients to drive customer value and create impact across the organization ZSers who are committed to helping active clients and their customers thrive 9,500+ 190+ clients have experienced ZS differentiation across 30 industries in over 90 countries, including: 1,200+ 80+ Therapy areas of experience 100% Of the top 50 pharma are our clients 90%+ Of our work in pharma and medtech ZS is a Premier Databricks partner with a strong track record of enabling clients to take full advantage of data by deploying ZS’s proven assets and the Databricks Unified Data Analytics Platform to serve as a one-stop-shop for all users involved in an end-to-end data engineering, data analytics, and data science pipeline.
  • 4. © 2021 ZS 4 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final The ZS R&D Excellence team partners with clinical, medical and scientific clients to discover and develop innovative medicines that improve patients’ lives R&D areas of excellence Our experts work side by side with clients, leveraging analytics and technology to create solutions that work in the real world from R&D to commercialization. Biomedical research — Scientific solutions — Bioinformatics and in-silico solutions — Scientific and research strategy — Research and early development technology platforms — Integrated evidence strategy — Real world data (RWD) strategy — Observational research — Rapid insight solutions Real world evidence (RWE) — RWE benchmarking — Evidence communication — Actionable RWE — RWD science Medical affairs — Global evidence planning — Medical org design — Scientific communication strategy — Medical science liaison design and support Global health economics and outcomes research — Economic modeling — Value communication strategy — Patient reported outcomes — Literature review Clinical development — Trial optimization — Quality risk monitoring — Biometrics and clinical data tech solutions — Site and patient engagement — Digital and virtual strategy R&D Excellence About ZS R&D 750+ Professionals focused on R&D programs 60+ Million invested in R&D data, analytics & technology assets Clinical Design Center 50+ Working with over 50 clients on R&D programs
  • 5. © 2021 ZS 5 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final Problem statement Current research and development landscapes in biopharma have been plagued by years of not following the FAIR principles of data management. This has limited the ability to fully democratize the use of this data and stifled areas of drug development and artificial intelligence-based medicine.
  • 6. © 2021 ZS 6 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final Moving raw data to the cloud for analysis LIMS/ELN Base calls ETL Read preparation analysis Automation — Instrument to cloud – Metadata capture – Resilient key — Systems integration – Sample descriptions Mature Immature Conversion to FASTQ — Can be performed on instrument – Custom processes — Automatically move FASTQ or BCL to cloud Converting FASTQ to dataset objects — Defining common models for individual FASTQ reads — Persisting these as data products Preprocessing — Dataset parallelism in executors – Quality control – Adapter and synthetic ligations trimmed Preprocessed data products — Mapping to reference data products — Creation of specific data products for a pipeline Manual — No integration to LIMS — Metadata placed in file names Files to demultiplex and convert on instrument — Sharing of files by email or personal share drives No ETL — All data remains as compressed files Preprocessing — Run on a single node – Limited parallelism – Significantly long times – Secondary raw data artifacts Run against raw reference/genomic features — Mapped to entire reference genome — Significant number of useless artifacts generated
  • 7. © 2021 ZS 7 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final © 2021 ZS 7 Spark strategies for raw data ingestion Spark and Databricks in raw data ingestion — Scalable clusters that allow for magnitudes of time enhancements — Object-oriented nature of Spark Datasets allows for specific versions of FASTQ headers — Spark Structured Streaming and stepwise analytics — Oxford Nanopore and FAST5 sequencing pipelines Ingesting raw FASTQ to a dataset Datasets are parallelizable and fit multiple platforms Streaming analytics From raw FASTQ to a structured dataset persisted to data lake
  • 8. © 2021 ZS 8 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final Notebooks and user interaction Jobs API Databricks as a platform for analysis Data scientists, process developers and statisticians can interact with data products — Controlling data quality through definitions of external tables — Ad hoc analytics with creation of silver and bronze level data products — Benchmarking, performance enhancements/experimental execution Highly controlled processes can be created as traditional Spark application artifacts (.jar, .whl, .egg, etc.) — Data ingestion from raw sources — Demultiplexing — Controlled analytics pipelines Notebooks can also be the sources and definitions of spark execution jobs — Concordance testing — Visualizations and human-facing components
  • 9. © 2021 ZS 9 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final Replacing traditional methods for creating scalable and high throughput time-sensitive pipelines Step-wise execution from datasets Creating data products that have matchable entities to direct sequence products BWA, Bowtie, single node alignment methods Implementation of matching and identification methods to Spark UDFs Mapping throughput Low High Low High Aggregation and analysis
  • 10. © 2021 ZS 10 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final © 2021 ZS 10 Use case for clinical pipeline Liquid biopsy NGS analysis Amplicon-based pipeline — Anticipated onboarding tens of millions of patients — Need analysis time per patient to be in the realm of 1-2 LIMS System integrated directly into the file transfer to cloud storage (ADLSv2, S3) — Metadata and sample information tracked by a resilient key — Databricks Delta Lake implementation for data products Approximately 500,000 known sequence features were analyzed — Data products generated directly from reference sources — Products have reference, provenance and versioning metadata — Mapping strategy that implements a UDF to determine sequence matches Able to cut the mapping and amplicon identification process from four hours to less than one minute per patient — Demultiplex flow-cell files and spin up one Databricks cluster per patient — Stepwise job definitions allow joining executors at certain points in analysis LIMS — ELN — Data transfer strategy Amplicon-based analysis Post-Mapping and scientific analysis
  • 11. © 2021 ZS 11 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final Use case for clinical pipeline R&D data lake — Co-located data — Archival cloud storage — HA/DR Technical value Scientific value QC and prep reads — Able to reuse methods that are open source — Databricks and other vendors creating utilities in Spark Parallelization — Magnitudes of scalability — Data no longer tracked in databases and Biologic relevance of data — Use of scientific- friendly languages – Python – R — Interface with bins Enhance the Ability to gain data value — Data lakes — Structured data catalogs Consolidated lake — Democratization of data — FAIR principals Amplification Strategies — Quality assurance- based methods for amplification — Unique parent barcoding Novel Methods — Lower barrier to writing novel methods to support novel science Scalable application of powerful open- source technologies — Bioconductor — SparkR Growing structured and consolidated lake — Variational autoencoders — Machine learning (ML) models Raw data ingest Trim adapters Identify and map Post hoc analysis Train ML for AI
  • 12. © 2021 ZS 12 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final Topics Introduction NGS data persistence strategies NGS mapping and alignment strategies
  • 13. © 2021 ZS 13 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final 5847_Databricks_Data__AI_Summit_Talk_v4.1_Final Contact Info Andrew S. Brown Ph.D. Strategy & Architecture Manager https://www.linkedin.com/in/andrew-brown-73917014b/