SlideShare a Scribd company logo
1 of 58
Download to read offline
Big Data Analyses in Pharma
An Overview
Josef Scheiber, PhD
Managing Director
July 2015
Geographie
Startup Center in Waldsassen
Main site
Data Analyses and Software
Development
Westpark Center
Garmischer Str. in Munich
Scientific ActivitiesSince Jan 1, 2015
Basel/Switzerland
Data Curation and customer-
related activities
Prag
150 km
München
200 km
Berlin
300 km
Frankfurt
250 km
BioVariance at a Glance –
Get most out of your complex data
Curate.Integrate
Analyze.Model
Visualize.Explore
DECIDE
Overview
• Background
• Strategy
• Examples
Background
Courtesy: M. Zeinab, slideshare
What do we need out of Big Data?
1. What are the inhibitors of kinase X and the five most similar
kinases with IC50 < 1 μM and with MW < 500 from all internal and
external data sources?
2. What assay technologies have been used against my kinase?
Which cell lines?
3. What other proteins are in the same kinase branch as target X,
where there were validated chemical hits from external or
internal sources?
4. If I hit a particular kinase, what would the potential side-effect
profile look like? Which known inhibitor of this kinase has the
best safety profile and the fewest known IC50s?
5. Have I identified other compounds with a bioactivity profile
similar to compound X and with the same core substructure?
6. Can we create a phylochemical tree of kinases and for a new
kinase target place it into the tree on the basis of activity against a
reference panel of compounds?
7. Have I identified all kinases with an x-ray structure (in-house or
external) that are in pathway X?
Bridging Chemical and Biological Data: Implications for Pharmaceutical Drug Discovery
JL Jenkins, J Scheiber, D Mikhailov, A Bender, A Schuffenhauer, B Cornett, V Chan, J
Kondracki, B Rohde, JW Davies (2012) In: Computational Approaches in Cheminformatics and
Bioinformatics Edited by:A Bender, R Guha. 25-56 John Wiley & Sons, Inc.
ANSWERS
Context matters!
metabolites
drugs
targets pathways
diseases (phenotypes)
Context matters
RNADNA
It´s not that simple …
Descriptive:
What happened?
Diagnostic:
Why did it happen?
Predictive:
What will happen?
Prescriptive:
How can we make it
happen?
Better data for better analytics
Hindsight Insight Foresight
Need for interpretation
33,3
10
20
30
70
33,3 80
70
60
10
33,3
10 10 10
20
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Before molecular
biology
Molecular biology
golden age
Genomics age Deep sequencing
age
Very soon
Data Analysis Experiment Experimental Design
Big Data?
Volume
Genome Sequencing
Slide adapted from George Church
Genome Sequencing
Slide adapted from George Church
Cost Reduction - Example
458 Ferrari Spider - $398,000 in 2006 –
40 cents now!
 Much more data for way less
money
Challenges for Informatics? –
1 genome is roughly 500 GB/data
2011 – several 100 exomes
Drug Discovery Pipeline
Target
finding
Lead Finding
Lead
Optimization
… Phase 1 … Market
Drug candidates Patients
Velocity
Velocity
• Mutations in tumor
• Resistance mechanisms in patients
• long term/short term AE
• compliance
• Nutrition and microbiome
• Data from wearables relevant for drugs
For each patient
Variety
Variety
Variety
• Bioinformatics
• Clinical
• Social network
• E-health
• Also text/patents
A simplified overview –
Molecules in Man
Adapted from Gohlke JM, Portier CJ.
Environ. Health Perspect. 115:1261-1263 (2007)
A question of complexity –They all
interact …
Biology
Chemistry
Physics
Dealing with a very complex environment –
i.e. many opportunities
 DNA
 RNA
 Protein
 Interactions
 Clinical parameters
 Treatment History
 Tissue anatomy
 Surgical History
 Epigenetic Profiles from many
patients at different
timeponits
 Target
 Off-targets
 Metabolites
 Additional indications
 Unspecific effects
 Similar drugs
Adapted from: J. Scheiber; How can we enable drug discovery informatics for personalized healthcare?
Expert Opinion on Drug Discovery, 1-6; 2/2011
… individual polypharmacology
Sequences Expression Proteomics Biological networks
(but also: Cells, Tissues, Organs)
POPULATION
Veracity
Veracity
• Chemogenomics data
• Gene expression data
 Imputation?
Veracity - Chemogenomics
Adapted from Tanrikulu et al. Missing
Value Estimation for Compound-
Target Activity Data, J. Mol. Inf
Veracity - Interactomics
A Proteome-Scale Map of the Human
Interactome Network
Rolland, Thomas et al.
Cell , Volume 159 , Issue 5 , 1212 - 1226
Veracity – Social Media
Strategy
Biological/Pharmacological
Understanding
drugs
targets pathways
diseases (phenotypes)
Data integration strategy
a) A central vocabulary/pointer server (information
stored are preferred names and synonyms plus
pointers to data servers, where to find what)
b)  semantic integration layer with domain-specific
terminology and referential data
c) A database for each datatype collected, storing only
preferred names along with raw measurements
d) Clearly defined APIs for further integration with
public data sources and to enable large-scale
analyses
Vocabularies needed
• Genes, Drugs, Proteins
• Diseases
• Organisms
• Microbiome species & genes
• Localization & source
• Phenotype
• Metabolite common names
Answering workflow
Vocabulary
Vocabulary server acts as
translator, aggregator and
locator, i.e. knows where
the respective facts can be
found
Firmicutes produce alpha-Linolein and thereby cause gut irritation
species
metabolite
Further
Data of each type is
stored in a specific
database to
enhance
performance of
large-scale analyses
Expert tools talk to
data directly or via
webservices
API
API
API
API
Enduserinterfaceand
visualization
Examples
Genome data at scale
Workflow
Identify drug targets
(primary and off-targets,
from DrugBank)
Call variations on a per-
individuum basis
Workflow
Analyse mutation rates in
the targets and in
particular drug binding
pockets
Example: Donepezil /
Acetylcholinesterase
• PDB 4EY7
Image extracted from Cheung et al.,
2012 [2]
Example: Donepezil /
Acetylcholinesterase
Example: Acetylcholinesterase
Integrative Genomics Viewer
Not very successful
Alignment of the 3D
structures of mutant
number 52 (yellow) and
PDB 4EY7 AChE protein
(green). The only changed
residue is the Y150
(magenta) to H150 (red).
The white surface
represents the molecular
surface of donepezil.
Why is this a bad example?
AChE a key enzyme in human biology  these are
the most highly conserved, even interspecies
 Learning: Look at that stuff before investing
time 
Generating
Vocabularies
Vocabulary generation
Extensive mapping of terms from various sources
Vocabulary generation
397211
preferred
names
598532
synonyms
102086
identifiers
The chevron diagram shows the number of samples annotated
with names. Already by looking at the numbers you can see tha
mapping everything is non-trivial.
A Big Data exercise in itself …
Tweet mining
Mining Twitter for side effects
Needed Drug Name
and synonyms:
Adalimumab
Humira
Exemptia
331731-18-1
L04AB04
MedDRA vocabulary
Many birds tweet lots of noise …
BUT …
• [1] "Lipitor headache 0"
[1] "Lipitor rash 1"
[1] "Lipitor pain 27"
[1] "Lipitor bleeding 0"
[1] "Lipitor cough 0"
[1] "Lisinopril headache 0"
[1] "Lisinopril rash 0"
[1] "Lisinopril pain 8"
[1] "Lisinopril bleeding 0"
[1] "Lisinopril cough 7"
[1] "Simvastatin headache 0"
[1] "Simvastatin rash 0"
[1] "Simvastatin pain 0"
[1] "Simvastatin bleeding 0"
[1] "Simvastatin cough 0"
[1] "Plavix headache 0"
[1] "Plavix rash 0"
[1] "Plavix pain 0"
[1] "Plavix bleeding 1"
[1] "Plavix cough 0"
[1] "Crestor headache 0"
[1] "Crestor rash 0"
[1] "Crestor pain 0"
[1] "Crestor bleeding 0"
[1] "Crestor cough 0"
Top 200 drugs
- Cutoff is at 1500 tweets that a
few drugs easily surpass (although
it's mostly only pharmacies
advertizing)
- Others are not mentioned once
(probably a synonym issue as I
restricted to English as language). -
- top drugs are tweeted more
often, but e.g. Tarceva (in 2006) at
the very bottom also reaches the
top number of tweets (109 on list).
089 – 189 6582 – 80
Garmischer Str. 4/V
80339 München
josef.scheiber@biovariance.com:
09632 – 9248 325
Konnersreuther Str. 6g
95652 Waldsassen
Questions?

More Related Content

What's hot

Big Data, AI, and Pharma
Big Data, AI, and PharmaBig Data, AI, and Pharma
Big Data, AI, and PharmaAmit Sheth
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Digital Healthcare Trends: Transformation Towards Better Care Relationship
Digital Healthcare Trends: Transformation Towards Better Care RelationshipDigital Healthcare Trends: Transformation Towards Better Care Relationship
Digital Healthcare Trends: Transformation Towards Better Care RelationshipKumaraguru Veerasamy
 
Data Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryData Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryDinesh V
 
Digital transformation in the Pharma Industry
Digital transformation in the Pharma Industry Digital transformation in the Pharma Industry
Digital transformation in the Pharma Industry Maria Alexandri
 
Big Data in Medicine
Big Data in MedicineBig Data in Medicine
Big Data in MedicineNasir Arafat
 
How AstraZeneca is Applying AI, Imaging & Data Analytics (AI-Driven Drug Deve...
How AstraZeneca is Applying AI, Imaging & Data Analytics (AI-Driven Drug Deve...How AstraZeneca is Applying AI, Imaging & Data Analytics (AI-Driven Drug Deve...
How AstraZeneca is Applying AI, Imaging & Data Analytics (AI-Driven Drug Deve...Nick Brown
 
Artificial Intelligence in Health Care
Artificial Intelligence in Health Care Artificial Intelligence in Health Care
Artificial Intelligence in Health Care 247 Labs Inc
 
How to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapHow to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapCCG
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiBusiness Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiProfessor Lili Saghafi
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industryBhagath Gopinath
 
Analytics in healthcare
Analytics in healthcareAnalytics in healthcare
Analytics in healthcareAnushkaAlok
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingHealth Catalyst
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationAnalytics8
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyShivam Dhawan
 

What's hot (20)

Data analytics
Data analyticsData analytics
Data analytics
 
Big Data, AI, and Pharma
Big Data, AI, and PharmaBig Data, AI, and Pharma
Big Data, AI, and Pharma
 
Digital Health Care Technology
Digital Health Care TechnologyDigital Health Care Technology
Digital Health Care Technology
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Digital Healthcare Trends: Transformation Towards Better Care Relationship
Digital Healthcare Trends: Transformation Towards Better Care RelationshipDigital Healthcare Trends: Transformation Towards Better Care Relationship
Digital Healthcare Trends: Transformation Towards Better Care Relationship
 
Big data
Big dataBig data
Big data
 
Data Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryData Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare Industry
 
Digital transformation in the Pharma Industry
Digital transformation in the Pharma Industry Digital transformation in the Pharma Industry
Digital transformation in the Pharma Industry
 
Big Data in Medicine
Big Data in MedicineBig Data in Medicine
Big Data in Medicine
 
How AstraZeneca is Applying AI, Imaging & Data Analytics (AI-Driven Drug Deve...
How AstraZeneca is Applying AI, Imaging & Data Analytics (AI-Driven Drug Deve...How AstraZeneca is Applying AI, Imaging & Data Analytics (AI-Driven Drug Deve...
How AstraZeneca is Applying AI, Imaging & Data Analytics (AI-Driven Drug Deve...
 
Artificial Intelligence in Health Care
Artificial Intelligence in Health Care Artificial Intelligence in Health Care
Artificial Intelligence in Health Care
 
How to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapHow to Create a Data Analytics Roadmap
How to Create a Data Analytics Roadmap
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiBusiness Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
 
Analytics in healthcare
Analytics in healthcareAnalytics in healthcare
Analytics in healthcare
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 
Medical Data Analysis
Medical Data AnalysisMedical Data Analysis
Medical Data Analysis
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics Modernization
 
Big data
Big dataBig data
Big data
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and Strategy
 

Viewers also liked

Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)Hellmuth Broda
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 
Improving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsImproving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsPaul Grant
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industrylurdhu agnes
 
New Pharma Market Reality - Predictive Analytics is the Solution
New Pharma Market Reality - Predictive Analytics is the SolutionNew Pharma Market Reality - Predictive Analytics is the Solution
New Pharma Market Reality - Predictive Analytics is the SolutionDr. Sandeep Juneja
 
Application of BI in pharmaceutical industry
Application of BI in pharmaceutical industryApplication of BI in pharmaceutical industry
Application of BI in pharmaceutical industryBiBoard.Org
 
Bio variance j_scheiber_bioit_repurposingworkshop2013_draft
Bio variance j_scheiber_bioit_repurposingworkshop2013_draftBio variance j_scheiber_bioit_repurposingworkshop2013_draft
Bio variance j_scheiber_bioit_repurposingworkshop2013_draftJosef Scheiber
 
BioVariance Research Services - Target Profile Prediction
BioVariance Research Services - Target Profile PredictionBioVariance Research Services - Target Profile Prediction
BioVariance Research Services - Target Profile PredictionJosef Scheiber
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutJosef Scheiber
 
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...Josef Scheiber
 
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug DiscoveryBioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug DiscoveryJosef Scheiber
 
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per SmartphoneMobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per SmartphoneJosef Scheiber
 
Digital Asset Management in Pharma
Digital Asset Management in PharmaDigital Asset Management in Pharma
Digital Asset Management in Pharmaphillycaferacer
 
Legal Content Management on SharePoint 2010
Legal Content Management on SharePoint 2010Legal Content Management on SharePoint 2010
Legal Content Management on SharePoint 2010phillycaferacer
 
Big Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized MedicineBig Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized MedicineSAP Technology
 
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Zeller Edm Summit   Agile Deployment Of Predictive AnalyticsZeller Edm Summit   Agile Deployment Of Predictive Analytics
Zeller Edm Summit Agile Deployment Of Predictive AnalyticsRonald.Ramos
 
20160512 predictive and adaptive approach
20160512   predictive and adaptive approach20160512   predictive and adaptive approach
20160512 predictive and adaptive approachSilvia Fragola
 
Agile 2013 presentation, tom grant
Agile 2013 presentation, tom grantAgile 2013 presentation, tom grant
Agile 2013 presentation, tom grantTom Grant
 
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...Society of Women Engineers
 

Viewers also liked (20)

Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
Analytics in Pharmaceutical Industry
Analytics in Pharmaceutical IndustryAnalytics in Pharmaceutical Industry
Analytics in Pharmaceutical Industry
 
Improving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsImproving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutions
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industry
 
New Pharma Market Reality - Predictive Analytics is the Solution
New Pharma Market Reality - Predictive Analytics is the SolutionNew Pharma Market Reality - Predictive Analytics is the Solution
New Pharma Market Reality - Predictive Analytics is the Solution
 
Application of BI in pharmaceutical industry
Application of BI in pharmaceutical industryApplication of BI in pharmaceutical industry
Application of BI in pharmaceutical industry
 
Bio variance j_scheiber_bioit_repurposingworkshop2013_draft
Bio variance j_scheiber_bioit_repurposingworkshop2013_draftBio variance j_scheiber_bioit_repurposingworkshop2013_draft
Bio variance j_scheiber_bioit_repurposingworkshop2013_draft
 
BioVariance Research Services - Target Profile Prediction
BioVariance Research Services - Target Profile PredictionBioVariance Research Services - Target Profile Prediction
BioVariance Research Services - Target Profile Prediction
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in Noordwijkerhout
 
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...
 
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug DiscoveryBioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
 
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per SmartphoneMobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
 
Digital Asset Management in Pharma
Digital Asset Management in PharmaDigital Asset Management in Pharma
Digital Asset Management in Pharma
 
Legal Content Management on SharePoint 2010
Legal Content Management on SharePoint 2010Legal Content Management on SharePoint 2010
Legal Content Management on SharePoint 2010
 
Big Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized MedicineBig Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized Medicine
 
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Zeller Edm Summit   Agile Deployment Of Predictive AnalyticsZeller Edm Summit   Agile Deployment Of Predictive Analytics
Zeller Edm Summit Agile Deployment Of Predictive Analytics
 
20160512 predictive and adaptive approach
20160512   predictive and adaptive approach20160512   predictive and adaptive approach
20160512 predictive and adaptive approach
 
Agile 2013 presentation, tom grant
Agile 2013 presentation, tom grantAgile 2013 presentation, tom grant
Agile 2013 presentation, tom grant
 
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...
 

Similar to Big Data in Pharma - Overview and Use Cases

Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.Elena Sügis
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for DiscoveryDayOne
 
01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptx01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptxHussainTaqi1
 
acs talk open source drug discovery
acs talk open source drug discoveryacs talk open source drug discovery
acs talk open source drug discoverySean Ekins
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challengesinside-BigData.com
 
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...DATAVERSITY
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Pistoia Alliance
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)jmoore89
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
2019-06-21 YC Preso V5.pdf
2019-06-21 YC Preso V5.pdf2019-06-21 YC Preso V5.pdf
2019-06-21 YC Preso V5.pdfYue Cathy Chang
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBigData_Europe
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!adcobb
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08Russ Altman
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 

Similar to Big Data in Pharma - Overview and Use Cases (20)

Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for Discovery
 
01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptx01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptx
 
acs talk open source drug discovery
acs talk open source drug discoveryacs talk open source drug discovery
acs talk open source drug discovery
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
 
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
2019-06-21 YC Preso V5.pdf
2019-06-21 YC Preso V5.pdf2019-06-21 YC Preso V5.pdf
2019-06-21 YC Preso V5.pdf
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health Domain
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 

Recently uploaded

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Recently uploaded (20)

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

Big Data in Pharma - Overview and Use Cases

  • 1. Big Data Analyses in Pharma An Overview Josef Scheiber, PhD Managing Director July 2015
  • 2. Geographie Startup Center in Waldsassen Main site Data Analyses and Software Development Westpark Center Garmischer Str. in Munich Scientific ActivitiesSince Jan 1, 2015 Basel/Switzerland Data Curation and customer- related activities Prag 150 km München 200 km Berlin 300 km Frankfurt 250 km
  • 3. BioVariance at a Glance – Get most out of your complex data Curate.Integrate Analyze.Model Visualize.Explore DECIDE
  • 6. Courtesy: M. Zeinab, slideshare
  • 7. What do we need out of Big Data? 1. What are the inhibitors of kinase X and the five most similar kinases with IC50 < 1 μM and with MW < 500 from all internal and external data sources? 2. What assay technologies have been used against my kinase? Which cell lines? 3. What other proteins are in the same kinase branch as target X, where there were validated chemical hits from external or internal sources? 4. If I hit a particular kinase, what would the potential side-effect profile look like? Which known inhibitor of this kinase has the best safety profile and the fewest known IC50s? 5. Have I identified other compounds with a bioactivity profile similar to compound X and with the same core substructure? 6. Can we create a phylochemical tree of kinases and for a new kinase target place it into the tree on the basis of activity against a reference panel of compounds? 7. Have I identified all kinases with an x-ray structure (in-house or external) that are in pathway X? Bridging Chemical and Biological Data: Implications for Pharmaceutical Drug Discovery JL Jenkins, J Scheiber, D Mikhailov, A Bender, A Schuffenhauer, B Cornett, V Chan, J Kondracki, B Rohde, JW Davies (2012) In: Computational Approaches in Cheminformatics and Bioinformatics Edited by:A Bender, R Guha. 25-56 John Wiley & Sons, Inc. ANSWERS
  • 10. Descriptive: What happened? Diagnostic: Why did it happen? Predictive: What will happen? Prescriptive: How can we make it happen? Better data for better analytics Hindsight Insight Foresight
  • 11. Need for interpretation 33,3 10 20 30 70 33,3 80 70 60 10 33,3 10 10 10 20 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Before molecular biology Molecular biology golden age Genomics age Deep sequencing age Very soon Data Analysis Experiment Experimental Design
  • 14. Genome Sequencing Slide adapted from George Church
  • 15. Genome Sequencing Slide adapted from George Church
  • 16. Cost Reduction - Example 458 Ferrari Spider - $398,000 in 2006 – 40 cents now!
  • 17.  Much more data for way less money
  • 18. Challenges for Informatics? – 1 genome is roughly 500 GB/data 2011 – several 100 exomes
  • 19. Drug Discovery Pipeline Target finding Lead Finding Lead Optimization … Phase 1 … Market Drug candidates Patients
  • 21. Velocity • Mutations in tumor • Resistance mechanisms in patients • long term/short term AE • compliance • Nutrition and microbiome • Data from wearables relevant for drugs
  • 25. Variety • Bioinformatics • Clinical • Social network • E-health • Also text/patents
  • 26. A simplified overview – Molecules in Man Adapted from Gohlke JM, Portier CJ. Environ. Health Perspect. 115:1261-1263 (2007)
  • 27. A question of complexity –They all interact … Biology Chemistry Physics
  • 28. Dealing with a very complex environment – i.e. many opportunities  DNA  RNA  Protein  Interactions  Clinical parameters  Treatment History  Tissue anatomy  Surgical History  Epigenetic Profiles from many patients at different timeponits  Target  Off-targets  Metabolites  Additional indications  Unspecific effects  Similar drugs Adapted from: J. Scheiber; How can we enable drug discovery informatics for personalized healthcare? Expert Opinion on Drug Discovery, 1-6; 2/2011
  • 30. Sequences Expression Proteomics Biological networks (but also: Cells, Tissues, Organs) POPULATION
  • 32. Veracity • Chemogenomics data • Gene expression data  Imputation?
  • 33. Veracity - Chemogenomics Adapted from Tanrikulu et al. Missing Value Estimation for Compound- Target Activity Data, J. Mol. Inf
  • 34. Veracity - Interactomics A Proteome-Scale Map of the Human Interactome Network Rolland, Thomas et al. Cell , Volume 159 , Issue 5 , 1212 - 1226
  • 36.
  • 39. Data integration strategy a) A central vocabulary/pointer server (information stored are preferred names and synonyms plus pointers to data servers, where to find what) b)  semantic integration layer with domain-specific terminology and referential data c) A database for each datatype collected, storing only preferred names along with raw measurements d) Clearly defined APIs for further integration with public data sources and to enable large-scale analyses
  • 40. Vocabularies needed • Genes, Drugs, Proteins • Diseases • Organisms • Microbiome species & genes • Localization & source • Phenotype • Metabolite common names
  • 41. Answering workflow Vocabulary Vocabulary server acts as translator, aggregator and locator, i.e. knows where the respective facts can be found Firmicutes produce alpha-Linolein and thereby cause gut irritation species metabolite Further Data of each type is stored in a specific database to enhance performance of large-scale analyses Expert tools talk to data directly or via webservices API API API API Enduserinterfaceand visualization
  • 43. Genome data at scale
  • 44. Workflow Identify drug targets (primary and off-targets, from DrugBank) Call variations on a per- individuum basis
  • 45. Workflow Analyse mutation rates in the targets and in particular drug binding pockets
  • 46. Example: Donepezil / Acetylcholinesterase • PDB 4EY7 Image extracted from Cheung et al., 2012 [2]
  • 49. Not very successful Alignment of the 3D structures of mutant number 52 (yellow) and PDB 4EY7 AChE protein (green). The only changed residue is the Y150 (magenta) to H150 (red). The white surface represents the molecular surface of donepezil.
  • 50. Why is this a bad example? AChE a key enzyme in human biology  these are the most highly conserved, even interspecies  Learning: Look at that stuff before investing time 
  • 52. Vocabulary generation Extensive mapping of terms from various sources
  • 53. Vocabulary generation 397211 preferred names 598532 synonyms 102086 identifiers The chevron diagram shows the number of samples annotated with names. Already by looking at the numbers you can see tha mapping everything is non-trivial. A Big Data exercise in itself …
  • 55. Mining Twitter for side effects Needed Drug Name and synonyms: Adalimumab Humira Exemptia 331731-18-1 L04AB04 MedDRA vocabulary
  • 56. Many birds tweet lots of noise … BUT … • [1] "Lipitor headache 0" [1] "Lipitor rash 1" [1] "Lipitor pain 27" [1] "Lipitor bleeding 0" [1] "Lipitor cough 0" [1] "Lisinopril headache 0" [1] "Lisinopril rash 0" [1] "Lisinopril pain 8" [1] "Lisinopril bleeding 0" [1] "Lisinopril cough 7" [1] "Simvastatin headache 0" [1] "Simvastatin rash 0" [1] "Simvastatin pain 0" [1] "Simvastatin bleeding 0" [1] "Simvastatin cough 0" [1] "Plavix headache 0" [1] "Plavix rash 0" [1] "Plavix pain 0" [1] "Plavix bleeding 1" [1] "Plavix cough 0" [1] "Crestor headache 0" [1] "Crestor rash 0" [1] "Crestor pain 0" [1] "Crestor bleeding 0" [1] "Crestor cough 0"
  • 57. Top 200 drugs - Cutoff is at 1500 tweets that a few drugs easily surpass (although it's mostly only pharmacies advertizing) - Others are not mentioned once (probably a synonym issue as I restricted to English as language). - - top drugs are tweeted more often, but e.g. Tarceva (in 2006) at the very bottom also reaches the top number of tweets (109 on list).
  • 58. 089 – 189 6582 – 80 Garmischer Str. 4/V 80339 München josef.scheiber@biovariance.com: 09632 – 9248 325 Konnersreuther Str. 6g 95652 Waldsassen Questions?