SlideShare ist ein Scribd-Unternehmen logo
1 von 41
October 1st 2015
Opportunities for HPC in pharma
R&D
A Pistoia Alliance webinar
Peter Coveney, Matt Gianni, and Darren Green
This webinar is being recorded
©PistoiaAlliance
Speakers
Opportunities for HPC in pharma R&D 31st October 2015
Prof Peter V. Coveney holds a chair in Physical Chemistry, is an
Honorary Professor in Computer Science, and is Director of the
Centre for Computational Science (CCS) at University College
London (UCL). Coveney is active in a broad area of interdisciplinary
research including condensed matter physics and chemistry,
materials science, as well as life and medical sciences in all of which
high performance computing plays a major role.
Matt Gianni is responsible for representing Cray’s solutions from a
technical and scientific perspective within the Life Science markets.
Over the past 15 years, Matt focused on accelerating drug discovery
using computational technologies and has held key technical roles
with Elsevier, Symyx, MDL and Exelixis prior to joining Cray.
Darren Green is Director of Computational Chemistry,
GlaxoSmithKline. Based at Stevenage, his group specialises in the
application of molecular design, data analysis, predictive modelling
and chemoinformatics methods to drug discovery. Darren also leads
the Compound Collection Enhancement strategy for GSK. Darren
has a PhD in Theoretical Chemistry from the University of
Manchester. He is a Fellow of the Royal Society of Chemistry and a
member of the UK government’s e-Infrastructure Leadership Council.
Opportunities for HPC
in pharma R&D
Peter Coveney
Centre for Computational Science,
University College London
United Kingdom
Drug Screening
Searching for a needle in a haystack
To make use of HPC in pharma R&D:
• Predictions must be rapid, accurate and reproducible
• Requires high performance computing & automation
2
A virtual screening tool — a binding affinity calculator (BAC) —
is able to reliably predict binding affinities of compounds with
target proteins, and can be used potentially as a drug ranking
tool in pharmaceutical lead discovery or in clinical application.
Blackbox-like
BAC
Ranking of
binding
affinities
Virtual Screening Tools Based on Molecular Dynamics
6
S. K. Sadiq, D. Wright, S. J. Watson, S. J. Zasada, I. Stoica, Ileana, and P. V. Coveney, "Automated
Molecular Simulation-Based Binding Affinity Calculator for Ligand-Bound HIV-1 Proteases", Journal of
Chemical Information and Modeling, 48, (9), 1909-1919, (2008), DOI: 10.1021/ci8000937.
The virtual screening tool requires a combination of hardware and software.
BAC: rapid and accurate binding affinity calculation on timescales relating to
pharmaceutical lead discovery.
The architecture is that of an HPC machine (either multicore or manycore/GPU based).
Rapid, Accurate, Reproducible and Automatic
7
Total of 10,000 cores on
HPC/cloud resources
required per study
Reproducible:
Two independent studies of the
same target and ligands
produce identical results.
Less than
10 wallclock
hours
8
Drug Ranking with Schrödinger
Schrödinger products with capability of binding affinity calculations:
̶ Desmond: High-performance molecular dynamics simulations for
biomolecular systems
̶ FEP+: a rigorous approach for computing binding free energies that
provides significant value to industrial drug discovery efforts
Both Desmond and FEP+ support GPGPUs.
The Schrödinger Suite comprises (proprietary) software and runs on low end
GPU boards. FEP is one of the methods available for binding affinity prediction.
It has a limited domain of validity (congeneric series, same charges at both end
points, etc.); reproducibility remains an issue.
Our own capabilities, which are based on a BAC for “MMPBSA” and TI,
address reproducibility through the requirement to perform large scale
ensembles of molecular dynamics calculations. This calls for HPC
architectures, whether multicore or GPGPU [i.e. we exploit BIG machines,
not lower end resources (whether GPU or multicore, these cannot support the
turn around required)]
Automation & Integration of Services
9
The BAC workflow requires resources of different scales to execute
Project
data
warehouse
Coordinating Workflow Engine
BAC
Prepare
BAC Simulate BAC Post
Process
EGI/Cloud
resources AHE
PRACE
Resources
Result
EUDAT Data Staging Services
Long term
EUDAT
storage
EGI/Cloud
resources
10
Genome Sequencing:
1
0
1 Human Genome in:
~5 years (2001)
2 years (2004)
4 days (Jan 2008)
16 Hours (Oct 2008)
3 Hours (Nov 2009)
6 minutes (recent)
Big Data in Biomedicine & Healthcare
Use of HPC in the context of genomics
and gene sequencing
• Electronic Health Records
• Integration of omics & imaging data
Requires rapid development of computational
science and informatics capabilities to deal
with management and analysis of data.
New Machines
11
Cray Solutions for Life Sciences
Healthcare Provider: The Promise of Precision Treatment
Cray® XK7™ supercomputer
Cray’s Urika™ platform
Case study:
Oak Ridge National Laboratory (ORNL) is using
computing to delve deeper into big health data and
is proposing innovative solutions to grand
challenges in the country's health care system.
ORNL researchers are using Titan to simulate
outcomes of interventions, Urika for pattern
discovery, and cloud computing to understand what
happened.
Petascale Computing Facilities Used by Us
12
Kraken Stampede Lonestar Anton
HECToR
PRACE
ARCHER
EMERALD
Blue Joule Blue Wonder
GPGPU
cluster
Darren Green, GSK
C O M P U T E | S T O R E | A N A L Y Z E
Pistoia Alliance 2015
Oct 1, 2015
C O M P U T E | S T O R E | A N A L Y Z E
About Cray
Cray Inc.
Seymour Cray founded Cray Research in 1972
• 1972-1996, Cray Research grew to leadership in Supercomputing
• 1996-2000, Cray was subsidiary of SGI
• 2000- present, Cray Inc. growing to $525M in revenue in 2013
• Cray Inc. formed in April 2000
Cray Inc.
• NASDAQ: CRAY
• Over 1,000 employees across 30 countries
• Headquartered in Seattle, WA
Three Focus Areas
• Computation
• Storage
• Analytics
Seven Major
Development Sites:
• Austin, TX
• Chippewa Falls, WI
• Pleasanton, CA
• St. Paul, MN
• San Jose, CA
• Seattle, WA
• Bristol, UK
C O M P U T E | S T O R E | A N A L Y Z E
Cray’s Vision:
The Fusion of Supercomputing and Big & Fast Data
Modeling The World
Cray Supercomputers solving “grand challenges” in science, engineering and analytics
Compute Store Analyze
Data-
Intensive
Processing
High throughput event
processing & data
capture from sensors,
data feeds and
instruments
Math Models
Modeling and
simulation augmented
with data to provide
the highest fidelity
virtual reality results
Data Models
Integration of datasets
and math models for
search, analysis,
predictive modeling
and knowledge
discovery
Cray Inc.
C O M P U T E | S T O R E | A N A L Y Z E
Cray Product Range and LS Applicability
 Aries Interconnect
 Scalability
 Package density
 Accelerators
 Upgradeability
 Integrated Stack
 Best in class power
and cooling
 Accelerator density
 Proven at scale
 Integrated h/w and
s/w stack
 Developer
productivity
CS400/Storm
Cluster
Supercomputer
XC40
Supercomputer
 Molecular Modeling
 Structural Biology
 Machine Learning
 NGS
 Bioinformatics
 Image Analysis
 Molecular Modeling
 Structural Biology
 Machine Learning
 NGS
 Bioinformatics
 Image Analysis
C O M P U T E | S T O R E | A N A L Y Z E
Unprecedented Scalability
18
Source: Jim Phillips, SC’12
Image: http://www.ks.uiuc.edu/Research/vmd/minitutorials/gelato/
Cray Inc. Proprietary
10/1/2015
Satellite Tobacco
Mosaic Virus
C O M P U T E | S T O R E | A N A L Y Z E
Applying HPC Best Practice to Speed Up MegaSeq
19
Megan Puckelwartz, et al. (University of Chicago)
exploit the fact that the cost of sequencing an entire
human genome is now moving into the range where it is
being broadly applied in both the research and clinical
settings
http://beagle.ci.uchicago.edu/science-at-beagle/
Cray Inc. Proprietary10/1/2015
Puckelwartz M J et al. Bioinformatics
2014;bioinformatics.btu071
Parallelization
C O M P U T E | S T O R E | A N A L Y Z E
Project: Parallelizing Inchworm
20
Butterfly computes the final
assembly
M. G. Grabherr, et a., Nat. Biotechnol.; 29(7): 644-652, 2013
Trinity: software tool developed for de novo reconstruction of transcriptome
from RNA-seq data
Chrysalis bundles the contigs and
builds individual de Bruijn graphs
Inchworm uses a greedy
algorithm search on k-mer graph
to assemble sequence contigs
http://trinityrnaseq.sourceforge.net/
Dr Pierre Carrier, Dr Carlos P Sosa, Dr Bill Long, Dr Brian Haas,
Dr Timothy Tickle
10/1/2015 Cray Inc. Proprietary
R. Henschel, et al., Trinity RNA-Seq assembler performance optimization. XSEDE 2012 Proceedings of the 1st Conference of the Extreme Science and
Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
C O M P U T E | S T O R E | A N A L Y Z E
Cray Product Range and LS Applicability
 Lustre parallel file system
 Single POSIX namespace
 Modular scaling 7.5GB/s-1.7TB/s
 Integrated and preconfigured
 Reliability and availability at scale
 Multi tier single namespace archive
 Rule based policy migration
 Flexible integration with most OEM
tape and disk
 Preconfigured and integrated
Archive
Lustre Parallel File System  Improved Scalability
 Converged storage across
grid, analytics, Hadoop
 Storage layer for
Cassandra, Spark, RDB
 Improve I/O
 Data Lake archival
 Analytical data archival
 Market data archival
 Data no longer ‘deep
sixed’
 NGS data archive
C O M P U T E | S T O R E | A N A L Y Z E
Cray Product Range and LS Applicability
 Most scalable graph
processor available
 Whole graph analytics
possible
 Open RDF/Sparql
 Single memory space
and extreme threaded
processor
Urika-GD
Graph Discovery
Appliance
 Precision Medicine
 Drug Repurposing
 Cybersecurity
 Data Integration
 Cohort Selection
 Cloudera 5.2/Yarn
 Open to non CDH apps
 Dense compute and
memory
 SSD layer for HDFS
 Lustre/Posix for scale
out storage
Urika-XA
Extreme Analytics
Platform
 Spark optimized
 R/T streaming analytics
converged with regular
analytics
 Machine learning
 NGS Workflow and
Analytics
C O M P U T E | S T O R E | A N A L Y Z E
Life Science Market and Technology Drivers
New data sources and emerging analytical approaches to
enable predictive modeling and knowledge discovery
Convergence of analytics and supercomputing opening
new opportunities to meet the pace of discovery
Ad-hoc cluster infrastructures increasing complexity,
reliability and usability challenges
Struggling to keep compute infrastructures current, with
rapidly changing life sciences technologies
Race to understand patients, diseases and treatments, at
the molecular level
Precision
Medicine
Pace of
Technology
Cluster
Sprawl
Rise of High
Performance Analytics
Data Science
C O M P U T E | S T O R E | A N A L Y Z E
The Quest for In-Time Analytics
Responsetimeframes
<30ms
30ms
10min
>10min
Low-Latency
Batch
Few data
scientists who
wrangle data
Business
analysts
accustomed to
interactive time
frames
Streaming data
Stationary data
Low-latency applications require performance optimizations
• Memory-storage hierarchies
• Fast interconnects
C O M P U T E | S T O R E | A N A L Y Z E
Explosion in Data Volume, Variety and
Complexity
C O M P U T E | S T O R E | A N A L Y Z E
Explosion in Data Volume, Variety and
Complexity
C O M P U T E | S T O R E | A N A L Y Z E
Explosion in Data Volume, Variety and
Complexity
ELN
Medical Records
C O M P U T E | S T O R E | A N A L Y Z E
Explosion in Data Volume, Variety and
Complexity
ELN
Medical Records
C O M P U T E | S T O R E | A N A L Y Z E
Existing tools are failing to keep up
C O M P U T E | S T O R E | A N A L Y Z E
Modern NGS Multi-Step Analytics Pipelines
Next Generation
Sequencers
• mRNA
• miRNA
• Protein
• SNP
• Metabolite
Data Prep/
Acquisition
• Background
Correction
• Normalization
• QC
• SNP call
Base
Analytics
• dbSNP
• ClinVar
• Annovar
• Uniprot
• Biobank/LIMS
Contextualization
• Correlation
analysis
• Regression
• Hypothesis
testing
• Visualization
Advanced
Analytics
Actionable
Insight
Trial Data
Drug Data
“Big Data”
Patient Data
Data Data Data
C O M P U T E | S T O R E | A N A L Y Z E
Actionable
Insight
Cray Multi-Step Analytics Pipelines: Manage all
aspects of NGS pipeline in one environment
Data Prep/
Acquisition
• mRNA
• miRNA
• Protein
• SNP
• Metabolite
Base
Analytics
• Background
Correction
• Normalization
• QC
• SNP call
Data
Integration
• dbSNP
• ClinVar
• Annovar
• Uniprot
• Biobank/LIMS
Advanced
Analytics
• Correlation
analysis
• Regression
• Hypothesis
testing
• Visualization
C O M P U T E | S T O R E | A N A L Y Z E
Apache Spark Enables Modern Bioinformatics
MLlib
SQL
C O M P U T E | S T O R E | A N A L Y Z E
Connecting to the Enterprise
Big Data Platform
BI on HadoopBI and visualization Advanced analytics Data transformation
Data sources
C O M P U T E | S T O R E | A N A L Y Z E
• Memory - Urika-XA’s is configured with 6TB per rack supports
complex NGS workflows and provides the freedom to model data
based on the requirements of the analysis, as opposed to the
limitations of the machine
• Compute - Urika-XA’s provides over 1,500 cores per rack
bringing complex analysis of big data to interactive time scales
• Network – Urika-XA’s high speed interconnects accelerate
complex data joins and graph analytics at scale
• Storage
• Lustre – 120 TB of global POSIX compliant file system
• SSD – 38 TB of high speed local SDD storage
Urika-XA enables Spark
BioDT is an Open Platform
•Users aren’t relegated to a limited set of
proprietary tools
•Includes 250+ popular tools, including
tools from the Galaxy and GATK libraries
•Supports ADAM
•Easy to add new tools
•Easy to optimize tools for Hadoop
•Easy to search tools
•Tools can be R, PERL, or Python scripts
C O M P U T E | S T O R E | A N A L Y Z E
Lumenogix Bioinformatics-in-a-Box™ with Urika-XA
Whole Human Genome in 45 minutes
50x Whole Human
Genome
164
45
0
20
40
60
80
100
120
140
160
180
AWS Urika-XA
Minutes
Time to process 50x Whole Human Genome
Process Time
BWA 17 minutes
Tag & Shuffle Reads 2 minutes
Sort and Compress 1 minute
Mark Duplicates 1 minute
Realignment 6 minutes
Genotyping 18 minutes
Total 45 minutes
C O M P U T E | S T O R E | A N A L Y Z E
In a Cancer Biology Research Project
● Understanding the
relationship between
genes, does gene X
regulate the expression
of gene Y?
● How do mutations affect
these relationships?
● What is the effect on the
cell cycle?
● What are the effects on
genome stability.
Original t-test analysis
using R running on a
gene-by-gene basis
• A single gene takes ~
1 – 3 minutes to
analyze
• At this rate it would
take 25 days to
complete the entire
36K sample
experiment
Using Spark on Urika-XA
• Implemented the t-test
using Scala and the
Apache Commons
Mathematics Library,
in parallel, across
1,500 cores
• Completing the entire
experiment in under
20 minutes
• Bioinformatician-
friendly code
• Interactive
environment
C O M P U T E | S T O R E | A N A L Y Z E
Analytics Solutions
38
Powered By
Extreme Analytics Platform
• Turnkey Advanced Analytics Platform
• Next-Generation System Architecture
• Engineered for Performance
Graph Discovery Appliance
• Discover Unknown & Hidden
Relationships in Big Data
• Real-time Data Discovery
• Realize Rapid Time-to-Value
C O M P U T E | S T O R E | A N A L Y Z E
Thank you
39
Panel & audience discussion
Please enter your questions into the question or chat boxes
info@pistoiaalliance.org @pistoiaalliance www.pistoiaalliance.org
Thank you for your attention

Weitere ähnliche Inhalte

Was ist angesagt?

What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinarPistoia Alliance
 
Modern ML & AI Operations to Advance Healthcare
Modern ML & AI Operations to Advance HealthcareModern ML & AI Operations to Advance Healthcare
Modern ML & AI Operations to Advance HealthcareHolden Ackerman
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterParalyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterIOSR Journals
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at YorkMing Li
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational ScienceChelle Gentemann
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizLuis Marco Ruiz
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...Scott Farley
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 

Was ist angesagt? (20)

What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Cri big data
Cri big dataCri big data
Cri big data
 
Big Data
Big Data Big Data
Big Data
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
Modern ML & AI Operations to Advance Healthcare
Modern ML & AI Operations to Advance HealthcareModern ML & AI Operations to Advance Healthcare
Modern ML & AI Operations to Advance Healthcare
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterParalyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendation
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco Ruiz
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Presentation_Final
Presentation_FinalPresentation_Final
Presentation_Final
 

Ähnlich wie Opportunities for HPC in pharma R&D - main deck

Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Maze's Compass Platform - A data fabric for drug discovery and development
Maze's Compass Platform - A data fabric for drug discovery and developmentMaze's Compass Platform - A data fabric for drug discovery and development
Maze's Compass Platform - A data fabric for drug discovery and developmentNolan Nichols
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Data Science: Philosopher's Stone
Data Science: Philosopher's StoneData Science: Philosopher's Stone
Data Science: Philosopher's StoneVin Sharma
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Sage Base
 
Appistry WGDAS Presentation
Appistry WGDAS PresentationAppistry WGDAS Presentation
Appistry WGDAS Presentationelasticdave
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesTom Plasterer
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeWarren Kibbe
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovationopen_phacts
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 

Ähnlich wie Opportunities for HPC in pharma R&D - main deck (20)

Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Maze's Compass Platform - A data fabric for drug discovery and development
Maze's Compass Platform - A data fabric for drug discovery and developmentMaze's Compass Platform - A data fabric for drug discovery and development
Maze's Compass Platform - A data fabric for drug discovery and development
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
 
Data Science: Philosopher's Stone
Data Science: Philosopher's StoneData Science: Philosopher's Stone
Data Science: Philosopher's Stone
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
 
Appistry WGDAS Presentation
Appistry WGDAS PresentationAppistry WGDAS Presentation
Appistry WGDAS Presentation
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbe
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
AI for Science
AI for ScienceAI for Science
AI for Science
 

Mehr von Pistoia Alliance

Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesPistoia Alliance
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck finalPistoia Alliance
 
Heartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirtiHeartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirtiPistoia Alliance
 
Knowledge graphs ilaria maresi the hyve 23apr2020
Knowledge graphs   ilaria maresi the hyve 23apr2020Knowledge graphs   ilaria maresi the hyve 23apr2020
Knowledge graphs ilaria maresi the hyve 23apr2020Pistoia Alliance
 
Data market evolution, a future shaped by FAIR
Data market evolution, a future shaped by FAIRData market evolution, a future shaped by FAIR
Data market evolution, a future shaped by FAIRPistoia Alliance
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinarPistoia Alliance
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata managementPistoia Alliance
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Pistoia Alliance
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
 
Implementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcareImplementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcarePistoia Alliance
 
Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...Pistoia Alliance
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the futurePistoia Alliance
 
PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences Pistoia Alliance
 
AI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoEAI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoEPistoia Alliance
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Pistoia Alliance
 
Blockchain and IOT and the GxP Lab Slides
Blockchain and IOT and the GxP Lab SlidesBlockchain and IOT and the GxP Lab Slides
Blockchain and IOT and the GxP Lab SlidesPistoia Alliance
 

Mehr von Pistoia Alliance (20)

Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck final
 
Heartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirtiHeartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirti
 
Fair by design
Fair by designFair by design
Fair by design
 
Knowledge graphs ilaria maresi the hyve 23apr2020
Knowledge graphs   ilaria maresi the hyve 23apr2020Knowledge graphs   ilaria maresi the hyve 23apr2020
Knowledge graphs ilaria maresi the hyve 23apr2020
 
Data market evolution, a future shaped by FAIR
Data market evolution, a future shaped by FAIRData market evolution, a future shaped by FAIR
Data market evolution, a future shaped by FAIR
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinar
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
Implementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcareImplementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcare
 
Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences
 
AI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoEAI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoE
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 
Blockchain and IOT and the GxP Lab Slides
Blockchain and IOT and the GxP Lab SlidesBlockchain and IOT and the GxP Lab Slides
Blockchain and IOT and the GxP Lab Slides
 

Kürzlich hochgeladen

Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 

Kürzlich hochgeladen (20)

Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 

Opportunities for HPC in pharma R&D - main deck

  • 1. October 1st 2015 Opportunities for HPC in pharma R&D A Pistoia Alliance webinar Peter Coveney, Matt Gianni, and Darren Green
  • 2. This webinar is being recorded
  • 3. ©PistoiaAlliance Speakers Opportunities for HPC in pharma R&D 31st October 2015 Prof Peter V. Coveney holds a chair in Physical Chemistry, is an Honorary Professor in Computer Science, and is Director of the Centre for Computational Science (CCS) at University College London (UCL). Coveney is active in a broad area of interdisciplinary research including condensed matter physics and chemistry, materials science, as well as life and medical sciences in all of which high performance computing plays a major role. Matt Gianni is responsible for representing Cray’s solutions from a technical and scientific perspective within the Life Science markets. Over the past 15 years, Matt focused on accelerating drug discovery using computational technologies and has held key technical roles with Elsevier, Symyx, MDL and Exelixis prior to joining Cray. Darren Green is Director of Computational Chemistry, GlaxoSmithKline. Based at Stevenage, his group specialises in the application of molecular design, data analysis, predictive modelling and chemoinformatics methods to drug discovery. Darren also leads the Compound Collection Enhancement strategy for GSK. Darren has a PhD in Theoretical Chemistry from the University of Manchester. He is a Fellow of the Royal Society of Chemistry and a member of the UK government’s e-Infrastructure Leadership Council.
  • 4. Opportunities for HPC in pharma R&D Peter Coveney Centre for Computational Science, University College London United Kingdom
  • 5. Drug Screening Searching for a needle in a haystack To make use of HPC in pharma R&D: • Predictions must be rapid, accurate and reproducible • Requires high performance computing & automation 2
  • 6. A virtual screening tool — a binding affinity calculator (BAC) — is able to reliably predict binding affinities of compounds with target proteins, and can be used potentially as a drug ranking tool in pharmaceutical lead discovery or in clinical application. Blackbox-like BAC Ranking of binding affinities Virtual Screening Tools Based on Molecular Dynamics 6 S. K. Sadiq, D. Wright, S. J. Watson, S. J. Zasada, I. Stoica, Ileana, and P. V. Coveney, "Automated Molecular Simulation-Based Binding Affinity Calculator for Ligand-Bound HIV-1 Proteases", Journal of Chemical Information and Modeling, 48, (9), 1909-1919, (2008), DOI: 10.1021/ci8000937. The virtual screening tool requires a combination of hardware and software.
  • 7. BAC: rapid and accurate binding affinity calculation on timescales relating to pharmaceutical lead discovery. The architecture is that of an HPC machine (either multicore or manycore/GPU based). Rapid, Accurate, Reproducible and Automatic 7 Total of 10,000 cores on HPC/cloud resources required per study Reproducible: Two independent studies of the same target and ligands produce identical results. Less than 10 wallclock hours
  • 8. 8 Drug Ranking with Schrödinger Schrödinger products with capability of binding affinity calculations: ̶ Desmond: High-performance molecular dynamics simulations for biomolecular systems ̶ FEP+: a rigorous approach for computing binding free energies that provides significant value to industrial drug discovery efforts Both Desmond and FEP+ support GPGPUs. The Schrödinger Suite comprises (proprietary) software and runs on low end GPU boards. FEP is one of the methods available for binding affinity prediction. It has a limited domain of validity (congeneric series, same charges at both end points, etc.); reproducibility remains an issue. Our own capabilities, which are based on a BAC for “MMPBSA” and TI, address reproducibility through the requirement to perform large scale ensembles of molecular dynamics calculations. This calls for HPC architectures, whether multicore or GPGPU [i.e. we exploit BIG machines, not lower end resources (whether GPU or multicore, these cannot support the turn around required)]
  • 9. Automation & Integration of Services 9 The BAC workflow requires resources of different scales to execute Project data warehouse Coordinating Workflow Engine BAC Prepare BAC Simulate BAC Post Process EGI/Cloud resources AHE PRACE Resources Result EUDAT Data Staging Services Long term EUDAT storage EGI/Cloud resources
  • 10. 10 Genome Sequencing: 1 0 1 Human Genome in: ~5 years (2001) 2 years (2004) 4 days (Jan 2008) 16 Hours (Oct 2008) 3 Hours (Nov 2009) 6 minutes (recent) Big Data in Biomedicine & Healthcare Use of HPC in the context of genomics and gene sequencing • Electronic Health Records • Integration of omics & imaging data Requires rapid development of computational science and informatics capabilities to deal with management and analysis of data. New Machines
  • 11. 11 Cray Solutions for Life Sciences Healthcare Provider: The Promise of Precision Treatment Cray® XK7™ supercomputer Cray’s Urika™ platform Case study: Oak Ridge National Laboratory (ORNL) is using computing to delve deeper into big health data and is proposing innovative solutions to grand challenges in the country's health care system. ORNL researchers are using Titan to simulate outcomes of interventions, Urika for pattern discovery, and cloud computing to understand what happened.
  • 12. Petascale Computing Facilities Used by Us 12 Kraken Stampede Lonestar Anton HECToR PRACE ARCHER EMERALD Blue Joule Blue Wonder GPGPU cluster
  • 14. C O M P U T E | S T O R E | A N A L Y Z E Pistoia Alliance 2015 Oct 1, 2015
  • 15. C O M P U T E | S T O R E | A N A L Y Z E About Cray Cray Inc. Seymour Cray founded Cray Research in 1972 • 1972-1996, Cray Research grew to leadership in Supercomputing • 1996-2000, Cray was subsidiary of SGI • 2000- present, Cray Inc. growing to $525M in revenue in 2013 • Cray Inc. formed in April 2000 Cray Inc. • NASDAQ: CRAY • Over 1,000 employees across 30 countries • Headquartered in Seattle, WA Three Focus Areas • Computation • Storage • Analytics Seven Major Development Sites: • Austin, TX • Chippewa Falls, WI • Pleasanton, CA • St. Paul, MN • San Jose, CA • Seattle, WA • Bristol, UK
  • 16. C O M P U T E | S T O R E | A N A L Y Z E Cray’s Vision: The Fusion of Supercomputing and Big & Fast Data Modeling The World Cray Supercomputers solving “grand challenges” in science, engineering and analytics Compute Store Analyze Data- Intensive Processing High throughput event processing & data capture from sensors, data feeds and instruments Math Models Modeling and simulation augmented with data to provide the highest fidelity virtual reality results Data Models Integration of datasets and math models for search, analysis, predictive modeling and knowledge discovery Cray Inc.
  • 17. C O M P U T E | S T O R E | A N A L Y Z E Cray Product Range and LS Applicability  Aries Interconnect  Scalability  Package density  Accelerators  Upgradeability  Integrated Stack  Best in class power and cooling  Accelerator density  Proven at scale  Integrated h/w and s/w stack  Developer productivity CS400/Storm Cluster Supercomputer XC40 Supercomputer  Molecular Modeling  Structural Biology  Machine Learning  NGS  Bioinformatics  Image Analysis  Molecular Modeling  Structural Biology  Machine Learning  NGS  Bioinformatics  Image Analysis
  • 18. C O M P U T E | S T O R E | A N A L Y Z E Unprecedented Scalability 18 Source: Jim Phillips, SC’12 Image: http://www.ks.uiuc.edu/Research/vmd/minitutorials/gelato/ Cray Inc. Proprietary 10/1/2015 Satellite Tobacco Mosaic Virus
  • 19. C O M P U T E | S T O R E | A N A L Y Z E Applying HPC Best Practice to Speed Up MegaSeq 19 Megan Puckelwartz, et al. (University of Chicago) exploit the fact that the cost of sequencing an entire human genome is now moving into the range where it is being broadly applied in both the research and clinical settings http://beagle.ci.uchicago.edu/science-at-beagle/ Cray Inc. Proprietary10/1/2015 Puckelwartz M J et al. Bioinformatics 2014;bioinformatics.btu071 Parallelization
  • 20. C O M P U T E | S T O R E | A N A L Y Z E Project: Parallelizing Inchworm 20 Butterfly computes the final assembly M. G. Grabherr, et a., Nat. Biotechnol.; 29(7): 644-652, 2013 Trinity: software tool developed for de novo reconstruction of transcriptome from RNA-seq data Chrysalis bundles the contigs and builds individual de Bruijn graphs Inchworm uses a greedy algorithm search on k-mer graph to assemble sequence contigs http://trinityrnaseq.sourceforge.net/ Dr Pierre Carrier, Dr Carlos P Sosa, Dr Bill Long, Dr Brian Haas, Dr Timothy Tickle 10/1/2015 Cray Inc. Proprietary R. Henschel, et al., Trinity RNA-Seq assembler performance optimization. XSEDE 2012 Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
  • 21. C O M P U T E | S T O R E | A N A L Y Z E Cray Product Range and LS Applicability  Lustre parallel file system  Single POSIX namespace  Modular scaling 7.5GB/s-1.7TB/s  Integrated and preconfigured  Reliability and availability at scale  Multi tier single namespace archive  Rule based policy migration  Flexible integration with most OEM tape and disk  Preconfigured and integrated Archive Lustre Parallel File System  Improved Scalability  Converged storage across grid, analytics, Hadoop  Storage layer for Cassandra, Spark, RDB  Improve I/O  Data Lake archival  Analytical data archival  Market data archival  Data no longer ‘deep sixed’  NGS data archive
  • 22. C O M P U T E | S T O R E | A N A L Y Z E Cray Product Range and LS Applicability  Most scalable graph processor available  Whole graph analytics possible  Open RDF/Sparql  Single memory space and extreme threaded processor Urika-GD Graph Discovery Appliance  Precision Medicine  Drug Repurposing  Cybersecurity  Data Integration  Cohort Selection  Cloudera 5.2/Yarn  Open to non CDH apps  Dense compute and memory  SSD layer for HDFS  Lustre/Posix for scale out storage Urika-XA Extreme Analytics Platform  Spark optimized  R/T streaming analytics converged with regular analytics  Machine learning  NGS Workflow and Analytics
  • 23. C O M P U T E | S T O R E | A N A L Y Z E Life Science Market and Technology Drivers New data sources and emerging analytical approaches to enable predictive modeling and knowledge discovery Convergence of analytics and supercomputing opening new opportunities to meet the pace of discovery Ad-hoc cluster infrastructures increasing complexity, reliability and usability challenges Struggling to keep compute infrastructures current, with rapidly changing life sciences technologies Race to understand patients, diseases and treatments, at the molecular level Precision Medicine Pace of Technology Cluster Sprawl Rise of High Performance Analytics Data Science
  • 24. C O M P U T E | S T O R E | A N A L Y Z E The Quest for In-Time Analytics Responsetimeframes <30ms 30ms 10min >10min Low-Latency Batch Few data scientists who wrangle data Business analysts accustomed to interactive time frames Streaming data Stationary data Low-latency applications require performance optimizations • Memory-storage hierarchies • Fast interconnects
  • 25. C O M P U T E | S T O R E | A N A L Y Z E Explosion in Data Volume, Variety and Complexity
  • 26. C O M P U T E | S T O R E | A N A L Y Z E Explosion in Data Volume, Variety and Complexity
  • 27. C O M P U T E | S T O R E | A N A L Y Z E Explosion in Data Volume, Variety and Complexity ELN Medical Records
  • 28. C O M P U T E | S T O R E | A N A L Y Z E Explosion in Data Volume, Variety and Complexity ELN Medical Records
  • 29. C O M P U T E | S T O R E | A N A L Y Z E Existing tools are failing to keep up
  • 30. C O M P U T E | S T O R E | A N A L Y Z E Modern NGS Multi-Step Analytics Pipelines Next Generation Sequencers • mRNA • miRNA • Protein • SNP • Metabolite Data Prep/ Acquisition • Background Correction • Normalization • QC • SNP call Base Analytics • dbSNP • ClinVar • Annovar • Uniprot • Biobank/LIMS Contextualization • Correlation analysis • Regression • Hypothesis testing • Visualization Advanced Analytics Actionable Insight Trial Data Drug Data “Big Data” Patient Data Data Data Data
  • 31. C O M P U T E | S T O R E | A N A L Y Z E Actionable Insight Cray Multi-Step Analytics Pipelines: Manage all aspects of NGS pipeline in one environment Data Prep/ Acquisition • mRNA • miRNA • Protein • SNP • Metabolite Base Analytics • Background Correction • Normalization • QC • SNP call Data Integration • dbSNP • ClinVar • Annovar • Uniprot • Biobank/LIMS Advanced Analytics • Correlation analysis • Regression • Hypothesis testing • Visualization
  • 32. C O M P U T E | S T O R E | A N A L Y Z E Apache Spark Enables Modern Bioinformatics MLlib SQL
  • 33. C O M P U T E | S T O R E | A N A L Y Z E Connecting to the Enterprise Big Data Platform BI on HadoopBI and visualization Advanced analytics Data transformation Data sources
  • 34. C O M P U T E | S T O R E | A N A L Y Z E • Memory - Urika-XA’s is configured with 6TB per rack supports complex NGS workflows and provides the freedom to model data based on the requirements of the analysis, as opposed to the limitations of the machine • Compute - Urika-XA’s provides over 1,500 cores per rack bringing complex analysis of big data to interactive time scales • Network – Urika-XA’s high speed interconnects accelerate complex data joins and graph analytics at scale • Storage • Lustre – 120 TB of global POSIX compliant file system • SSD – 38 TB of high speed local SDD storage Urika-XA enables Spark
  • 35. BioDT is an Open Platform •Users aren’t relegated to a limited set of proprietary tools •Includes 250+ popular tools, including tools from the Galaxy and GATK libraries •Supports ADAM •Easy to add new tools •Easy to optimize tools for Hadoop •Easy to search tools •Tools can be R, PERL, or Python scripts
  • 36. C O M P U T E | S T O R E | A N A L Y Z E Lumenogix Bioinformatics-in-a-Box™ with Urika-XA Whole Human Genome in 45 minutes 50x Whole Human Genome 164 45 0 20 40 60 80 100 120 140 160 180 AWS Urika-XA Minutes Time to process 50x Whole Human Genome Process Time BWA 17 minutes Tag & Shuffle Reads 2 minutes Sort and Compress 1 minute Mark Duplicates 1 minute Realignment 6 minutes Genotyping 18 minutes Total 45 minutes
  • 37. C O M P U T E | S T O R E | A N A L Y Z E In a Cancer Biology Research Project ● Understanding the relationship between genes, does gene X regulate the expression of gene Y? ● How do mutations affect these relationships? ● What is the effect on the cell cycle? ● What are the effects on genome stability. Original t-test analysis using R running on a gene-by-gene basis • A single gene takes ~ 1 – 3 minutes to analyze • At this rate it would take 25 days to complete the entire 36K sample experiment Using Spark on Urika-XA • Implemented the t-test using Scala and the Apache Commons Mathematics Library, in parallel, across 1,500 cores • Completing the entire experiment in under 20 minutes • Bioinformatician- friendly code • Interactive environment
  • 38. C O M P U T E | S T O R E | A N A L Y Z E Analytics Solutions 38 Powered By Extreme Analytics Platform • Turnkey Advanced Analytics Platform • Next-Generation System Architecture • Engineered for Performance Graph Discovery Appliance • Discover Unknown & Hidden Relationships in Big Data • Real-time Data Discovery • Realize Rapid Time-to-Value
  • 39. C O M P U T E | S T O R E | A N A L Y Z E Thank you 39
  • 40. Panel & audience discussion Please enter your questions into the question or chat boxes