From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Big Data Analytics in the Health Domain
1. Integration and analysis of heterogeneous big data for precision
medicine and suggested treatments for different types of patients.
SC1-PM-18-2016: Big Data supporting Public Health policies
Big Data Analytics
Maria-Esther Vidal
Leibniz Information Centre For
Science and Technology
University Library (TIB), Germany
06.11.2017
BigDataEurope Workshop on Big Data in Climate Action,
Environment, Resource Efficiency and Raw Materials
1
http://project-iasis.eu
@Project_IASIS
2. Big Data Analytics and Drug Side Effects
2
Big Data Analytics study has found an association
between the use of proton-pump inhibitors and the
likelihood of incurring a heart attack
[Shah, SH. Clopidogrel Dosing and CYP2C19. Medscape. July 1, 2011. Medscape web site.]
Heart Attack
Electronic Health Records (EHRs) of nearly 3 million
people and trillions of pieces of medical data
Big Data
3. Big Data Analytics and Disease Predisposition
Researchers analyzed more than 7,700 brain
images from 1,171 people in various stages of
Alzheimer's progression using a variety of
techniques including magnetic resonance imaging
(MRI) and positron emission tomography (PET).
Alzheimer’s Disease Evolution
Big Data Analytics study has found that changes in
blood flow are the earliest known warning sign of
Alzheimer's.
Big Data
https://www.sciencedaily.com/releases/2016/07/160712130229.htm
3
4. Big Data: Medical Image
Large volumes of data produced by Imaging Techniques
Volume of medical images is growing exponentially
Annotated data or structured methods to annotate
medical images is challenging
Big Data Analytical Methods allow for the interpretability
of depicted contents
Scalable Methods for collecting, compressing, sharing,
and anonymizing medical data
Medical Image Processing from Big Data Point of View
4
5. Electronic Health Records (EHR) from Big Data Point of View
Big Data: Clinical Data
Large volumes of Clinical Data that needs to be
stored, retrieved, and aggregated
Scalable Methods for collecting, compressing,
sharing, and anonymizing medical data
Scalable Methods for signal processing and for
developing Big Data based clinical decision support
systems (CDSSs)
5
6. Genomics from Big Data Point of View
Big Data: Genomics Data
Human Genome consists of 30,000 to 35,000 genes
Scalable Methods for pathway analysis and for the discover
associations between observed gene expression changes and
predicted functional effects
Scalable Methods for Reconstruction of Metabolic and
Regulatory Networks
6
10. Agenda
1. iASiS: Big Data to Support Precision Medicine and Public
Health Policy
2. The iASiS Architecture
3. Big Data Management and Analytics in iASiS
10
11. Drug Ineffectiveness
11
[Source: Brian B. Spear, Margo Heath-Chiozzi, Jeffrey Huff, “Clinical Trends in Molecular Medicine,” Volume 7, Issue 5, 1 May 2001, pages 201-204.]
13. Drug Side Effects
13
https://www.topcanadianpharmacy.org/wp-content/uploads/2016/09/Plavix-package.jpg
Drug designed to prevent blood clots
The second-best-selling drug in the world
Different impact on protecting stent patients
from thrombosis depending on patient genetic
variance within CYP2C19
CYP2C19 encodes an enzyme that converts the
drug from an inactive to an active state.
[Shah, SH. Clopidogrel Dosing and CYP2C19. Medscape. July 1, 2011. Medscape web site.]
The right drug and dosage are
selected based on a patient
genome
14. Precision Medicine
14
Medical model that proposes the
customization of healthcare, with medical
decisions, practices, or products being
tailored to the individual patient
15. Precision Medicine
15
Stratified Medicine
Targeted Therapy
Personalized Medicine
P4 Medicine:
Personalized, Predictive,
Preventive, Participatory
Medical model that proposes the
customization of healthcare, with medical
decisions, practices, or products being
tailored to the individual patient
16. iASiS: Vision and Objectives
16
iASiS Vision:
Turn clinical, pharmacogenomics, and other Big Data into actionable
knowledge for personalized medicine and health policy-making
iASiS Objectives:
•Integrate automated unstructured and structured data analysis,
image analysis, and sequence analysis into a Big Data framework
•Use the iASiS framework to support personalized diagnosis and
treatment
22. Pilot 1: Lung Cancer
22
Motivation:
• Lung cancer among the most
• common and deadly diseases
• costly cancers
• Lung cancer is a heterogeneous
disease. Characteristics differ among
• patients
• tumor regions
iASiS will enable:
• Discovery of correlations among
tumor spread, prognosis, response to
treatment
• Unraveling molecular mechanisms
that predict response to different
tumor types (signatures)
23. Big Data in the Lung Cancer Pilot
• Pharmacological knowledge extracted
from publicly available datasets
• Biomedical ontologies and taxonomies
• terminology standardization
• semantically describing the EHRs
• EHRs in Spanish
• PET/CT Images
• Genomic Data/Liquid Biopsy
Samples
24. Pilot 2: Alzheimer's Disease
Motivation:
• Approximately, 10% of people over
65 suffer from Alzheimer’s
• Heterogeneity of the symptoms
impedes accurate diagnosis and
treatments
iASiS will enable:
• Discovery of patterns associated with
prognosis, outcomes, and response
to treatments
• Association of medical and lifestyle
advice to Alzheimer’s risk and stages
of severity
25. • EHRs in English
• MRI Brain Images
• Genomic Data
• Pharmacological knowledge extracted
from publicly available datasets
• Biomedical ontologies and taxonomies
• terminology standardization
• semantically describing the EHRs
Big Data in the Alzheimer's Disease Pilot
26. Clinical Big Data Analytics
Clinical
Notes NLP
Data
Preprocessing
Medical
Images
Deep Learning Predictive
Models
Medical
Vocabularies
Knowledge
Data Mining
Knowledge
Graph
27. Genomic Big Data Analytics
Identification of
RNAs regulated
by RBP
Comparison
with available
information
Integration with
transcriptomic
data
Hospital-derived
data
Knowledge
Graph
Identification of
key genes and
interactions
28. Open Big Data Analytics
•Heterogeneous open data
•Semantic indexing of the data via ontologies and thesauri
•Knowledge extraction from the data
• NLP and network analysis technologies
29. Agenda
1. iASiS: Big Data to Support Precision Medicine and Public
Health Policy
2. The iASiS Architecture
3. Big Data Management and Analytics in iASiS
29
33. Agenda
1. iASiS: Big Data to Support Precision Medicine and Public
Health Policy
2. The iASiS Architecture
3. Big Data Management and Analytics in iASiS
33
40. Ontario: Evaluation Study
40
Benchmark by Ali Hasnain et. al BioFed: federated query processing over
life sciences linked open data. Journal of Biomedical Semantics 2017.
RDF Dataset Name Number of RDF Triples
Chebi 4,772,706
DrugBank 517,023
Kegg 1,090,830
Affymetrix 44,207,146
Dailymed 162,972
Diseasome 72,445
Sider 101,542
Medicare 44,500
LinkedCT 9,804,652
Linked TCGA-A 35,329,868
41. Ontario: Evaluation Study
41
Benchmark by Ali Hasnain et. al BioFed: federated query processing over
life sciences linked open data. Journal of Biomedical Semantics 2017.
Category #Triple Patterns #Star-Shaped
Subqueries
#Union #Optional
Min Max Min Max Min Max Min Max
Simple
Queries
3 8 2 4 0 1 0 1
Complex
Queries
6 12 2 5 0 1 0 1
42. Ontario: Evaluation Study
42
• Ontario is compared with
state-of-the-art federated
query processing tools
(ANAPSID and FedX) and direct
SPARQL endpoint
• Ontario exhibits better
performance than the results
of the engines in terms of
throughput
43. Ontario: Evaluation Study
43
• Ontario is compared with
state-of-the-art federated
query processing tools
(ANAPSID and FedX) and direct
SPARQL endpoint
• Ontario exhibits better
performance than the results
of the engines in terms of
throughput
44. Big Data Analytics and Pattern Discovery
44
Pragmatics
ContextSemantics
Drug
Similarity
Target
Similarity
Detecting
Patterns
Predicting
Interactions
Computing
Similarity
Pattern Detection Prediction
Principle
Network of
Drugs, Targets,
and interactions
Patterns of
similar Drugs,
similar Targets,
and their
interactions
Discovered
Drug-Target
Interactions
ChEBI Ontology
46. Patterns between Drug and Side Effects
46
Predicted Interactions between Drugs and Side Effects
47. Drug-Target Interaction Discovery
47
Nuclear
Receptor GPCR
Ion
Channel
Enzym
e
Drugs 54 223 210 445
Targets 26 95 204 664
Interactions 90 635 1,476 2,926
Avg. Interaction per
Target 3.46 6.68 7.23 4.4
Avg. Interaction per
Drug 1.66 2.84 7.02 6.57
48. Supervised Machine Learning Approaches
48
Machine Learning for Drug-Target Interaction Discovery:
• BLM: Bipartite Local Method [Cheng et al]
• LapRLS: Laplacian Regularized Least Squares [Xia et al]
• GIP: Gaussian Interaction Profile [Van Laarhoven et al]
• KBMF2K: Kernelized Bayesian Matrix Factorization with
twin Kernels [Gonen]
• NBI: Network-Based Inference [Cheng et al]
49. Unsupervised Approaches
49
• semEP: Graph Partition into communities from where new
interactions are predicted [Palma, Vidal, and Raschid]
• Metis: Multilevel recursive-bisection [George Karypis and
Vipin Kumar]
• Ncut: Normalized Cuts [Shi and Malik]
50. Drug-Target Interaction Discovery
50
Top5 Novel interactions are interactions that do not appear in
the dataset and can be validated in STITCH
(http://stitch.embl.de/) and KEGG (http://www.kegg.jp )
Method
Nuclear
Receptor GPCR
Ion
Channel Enzyme
semEP 5 5 4 1
Metis 3 5 2 1
Ncut 3 4 1 1
BLM 2 1 0 0
NBI 1 1 1 2
GIP 4 2 1 3
LapRLS 4 4 2 2
KBMF2K 4 4 4 4
51. Conclusions and Future Work
51
Biomedicine Big Data The iASiS Pipeline
The iASiS Architecture
Big Data Management and Analytics
52. Conclusions and Future Work
52
Biomedicine Big Data The iASiS Pipeline
The iASiS Architecture
Big Data Management and Analytics
Next Steps:
● Collect Big Data
● Extract Knowledge from Big Data sources
● Create the iASiS Knowledge Graph
● Enforce Data Access and Privacy Policies
● Big Data Processing and Analytics