SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB:An Exploratory Data Science Platform
(A New Frontier: From Data Processing to Knowledge Exploration)
Mohammad Sadoghi
Assistant Professor
Department of Computer Science
Purdue University
IBM Cognitive Systems Institute Speaker Series
September 29, 2016
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
2
http://www.cpsresearch.eu/clinical-trials/
http://news.mit.edu/2015/mnookin-vaccination-public-health-0227
http://www.healthcarepackaging.com/trends-and-issues/clinical-trials
http://stormercellularloo.gq/evolve-ii-clinical-trial.html
https://www.geneticliteracyproject.org
Data is spread across many islands of disconnected sources
(a lack of holistic view)
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
3
http://www.cpsresearch.eu/clinical-trials/
http://news.mit.edu/2015/mnookin-vaccination-public-health-0227
http://www.healthcarepackaging.com/trends-and-issues/clinical-trials
http://stormercellularloo.gq/evolve-ii-clinical-trial.html
https://www.geneticliteracyproject.org
Sadly, adverse drug reactions (ADRs) is the 4th leading cause of
deaths in United States, resulting in100,000 loss of life annually
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
4
http://www.cpsresearch.eu/clinical-trials/
http://news.mit.edu/2015/mnookin-vaccination-public-health-0227
http://www.healthcarepackaging.com/trends-and-issues/clinical-trials
http://stormercellularloo.gq/evolve-ii-clinical-trial.html
https://www.geneticliteracyproject.org
Adverse drug reaction costs over $136 billion dollars in US annually
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Data
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Enriched Data
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Enriched Data at Web Scale
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
8
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
limit cells
growth
tum
or
suppressor
Why capture the semantic/context?
Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
9
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
limit cells
growth
tum
or
suppressor
Why capture the semantic/context?
Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
10
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
limit cells
growth
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
tum
or
suppressor
Why capture the semantic/context?
Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
11
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
limit cells
growth
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
tum
or
suppressor
?
Why capture the semantic/context?
Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
12
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
tum
or
suppressor
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
limit cells
growth
?
?
?
Why capture the semantic/context?
Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
13
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
(1) Instance Layer: Capturing raw data instances
including both structured & semi-structured data
How to capture the context?
limit cells
growth
tum
or
suppressor
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
14
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
How to capture the context?
limit cells
growth
tum
or
suppressor
(2) Relation Layer: Capturing the interconnectedness
of data instances across data sources
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
15
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
How to capture the context?
limit cells
growth
tum
or
suppressor
(3) Semantic Layer: Capturing conceptual relationships
among data instances and their types
© 2016 Mohammad Sadoghi (Purdue University)
Enriched Data Model: Semantic is essential to connect the dots
16
PTGS2
(Gene)
TP53
(Gene)
Acetaminophen
(Tylenol)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Relief
Fever
Ibuprofen
(Advil)
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Drug	Name Drug	Targets	
(Genes)
Symptomatic	
Treatment
Ibuprofen PTGS2 Rheumatoid	 Arthritis
Acetaminophen PTGS2 Relief Fever
Methotrexate DHFR Antineoplastic	
Anti-metabolite
Warfarin TP53	 Embolism
(Blood	 Clot)
Gene Interaction
PTGS2 TP53	(Gene)
DrugBank: Bioinformatics & Cheminformatics Resource
CTD: Comparative Toxicogenomics Database
Gene Function
TP53 Tumor	Suppressor
DHFR Limits	Cell Growth
Uniprot: Universal Protein Resource
Gene Disease
TP53	 Osteosarcoma
SemanticlayerRelationlayerInstancelayer
Methotrexate
DHFR
(Gene)
Arthritis
Warfarin
Embolism
(Blood Clot)
InformationKnowledgeData
Warfarin has narrow
therapeutic range
(fatal outcomes)
Dosage for Asians
population: 3.4 mg
Dosage for Whites
population: 5.1mg
Dosage for
African-Americans
population: 6.1 mg
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
17
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
18
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to
ethnic background?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
19
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to
ethnic background?”
“Does Warfarin have a narrow
therapeutic range?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
20
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to
ethnic background?”
“Does Warfarin have a narrow
therapeutic range?”
“What are the disjoint classes of
population with respect to Warfarin?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
21
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to
ethnic background?”
“Does Warfarin have a narrow
therapeutic range?”
“What are the disjoint classes of
population with respect to Warfarin?”
“What are the adverse reactions
of Warfarin?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
22
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to
ethnic background?”
“Does Warfarin have a narrow
therapeutic range?”
“What are the disjoint classes of
population with respect to Warfarin?”
“What are the adverse reactions
of Warfarin?”
“What is an effective dosage of
Warfarin for preventing blood clot?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
23
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint
classes of population with
respect to Warfarin?”
“What is an effective
dosage of Warfarin for
preventing blood clot?”
“Does Warfarin have
a narrow therapeutic range?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
24
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint
classes of population with
respect to Warfarin?”
“What is an effective
dosage of Warfarin for
preventing blood clot?”
“Does Warfarin have
a narrow therapeutic range?”
Dosage for
African-Americans
population: 6.1 mg
Dosage for Whites
population: 5.1mg
Dosage for Asians
population: 3.4 mg
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
25
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint
classes of population with
respect to Warfarin?”
Querying different sources
return 6.1 mg, 5.1 mg, & 3.4 mg,
so is the data inconsistent?
(revisiting consistent answers formalism
& possible world semantics)
“What is an effective
dosage of Warfarin for
preventing blood clot?”
“Does Warfarin have
a narrow therapeutic range?”
Dosage for
African-Americans
population: 6.1 mg
Dosage for Whites
population: 5.1mg
Dosage for Asians
population: 3.4 mg
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
26
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint
classes of population with
respect to Warfarin?”
Querying different sources
return 6.1 mg, 5.1 mg, & 3.4 mg,
so is the data inconsistent?
(revisiting consistent answers formalism
& possible world semantics)
“What is an effective
dosage of Warfarin for
preventing blood clot?”
“Does Warfarin have
a narrow therapeutic range?”
Dosage for
African-Americans
population: 6.1 mg
Dosage for Whites
population: 5.1mg
Dosage for Asians
population: 3.4 mg
Given the known narrow therapeutic range,
so is 5.1 mg close enough to 5.0 mg?
(fuzzy answers formalism in
presence of enriched data)
© 2016 Mohammad Sadoghi (Purdue University)
Spark Architecture: Knowledge Oblivious
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Immutable
Collection of
Objects)
Storage
Resource
Virtualization
27
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Apache Spark (General Data Processing on Distributed Memory)
Spark Data Model (Resilient Distributed Datasets — RDDs)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Personalized Medicine
(Drug Discovery/Safety)
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
Spark Architecture: Knowledge Oblivious
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Immutable
Collection of
Objects)
Storage
Resource
Virtualization
28
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Apache Spark (General Data Processing on Distributed Memory)
Spark Data Model (Resilient Distributed Datasets — RDDs)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
29
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
30
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
31
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Semantic Layer Ontology Rules Stochastic Models Tensor Embedding
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
32
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
33
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
ReasoningRefinementCuration Fusion Discovery
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP)
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture:Active Data Path
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
34
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
ReasoningRefinementCuration Fusion
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Discovery
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Virtualized Hardware Acceleration (GPU & FPGA)
Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
The First Step!
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
35
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
ReasoningRefinementCuration Fusion
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Discovery
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP)
L-Store
(Real-time OLTP+OLAP)
FQP
(Flexible Query Processor)
EmbedS
(Ontology)
Phenomenological Features
(Deep-Learning-as-Oracle)
PADRES
(Event Processing)
IBM DB2 BLU
(Column Store)
SPIDER
(Declarative Data Cleansing)
Vraph
(Vectorized Graph Processing)
Tiresias
(Predicting Adverse Drug Reaction)
fpga-ToPSS
(Algorithmic Trading)
Compliance
Informatics
Virtualized Hardware Acceleration (GPU & FPGA)
© 2016 Mohammad Sadoghi (Purdue University)
ThankYou
Q&A
Exploratory Systems Lab (ExpoLab)
website: https://msadoghi.github.io/
© 2016 Mohammad Sadoghi (Purdue University)
Data/Knowledge Exploration:
• Mohammad Sadoghi, Kavitha Srinivas, Oktie Hassanzadeh,Yuan-Chi Chang, Mustafa Canim,Achille Fokoue,Yishai A. Feldman: Self-Curating Databases. EDBT 2016
• Amit Chandel, Oktie Hassanzadeh, Nick Koudas, Mohammad Sadoghi, Divesh Srivastava: Benchmarking declarative approximate selection predicates. SIGMOD Conference 2007: 353-364
• Oktie Hassanzadeh, Mohammad Sadoghi, Renée J. Miller:Accuracy of Approximate String Joins Using Grams. QDB 2007
Drug Safety:
• Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, Ping Zhang: Predicting Drug-Drug InteractionsThrough Large-Scale Similarity-Based Link Prediction. ESWC 2016
• Achille Fokoue, Oktie Hassanzadeh, Mohammad Sadoghi, Ping Zhang: Predicting Drug-Drug InteractionsThrough Similarity-Based Link Prediction OverWeb Data.WWW 2016
OLTP & OLAP:
• Mohammad Sadoghi, Souvik Bhattacherjee, Bishwaranjan Bhattacharjee, Mustafa Canim: L-Store:A Real-time OLTP and OLAP System. CoRR abs/1601.04084 (2016)
• Kaiwen Zhang, Mohammad Sadoghi, Hans-Arno Jacobsen: DL-Store:A Distributed Hybrid OLTP and OLAP Data Processing Engine. ICDCS 2016
• Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Exploiting SSDs in operational multiversion databases.VLDB J. 25(5): 651-672 (2016)
• Mohammad Sadoghi, Mustafa Canim, Bishwaranjan Bhattacharjee, Fabian Nagel, Kenneth A. Ross: Reducing Database Locking ContentionThrough Multi-version Concurrency. PVLDB 7(13):
1331-1342 (2014)
• Prashanth Menon,Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: CaSSanDra:An SSD boosted key-value store. ICDE 2014: 1162-1167
• Prashanth Menon,Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: Optimizing key-value stores for hybrid storage architectures. CASCON 2014: 355-358
• Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Making Updates Disk-I/O Friendly Using SSDs. PVLDB 6(11): 997-1008 (2013)
Hardware Acceleration:
• Rajesh R. Bordawekar, Mohammad Sadoghi:Accelerating database workloads by software-hardware-system co-design. ICDE 2016
• Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: SplitJoin:A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision. USENIX AnnualTechnical
Conference 2016
• Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen:The FQPVision: Flexible Query Processing on a Reconfigurable Computing Fabric. SIGMOD Record 44(2): 5-10 (2015)
• Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Configurable hardware-based streaming architecture using Online Programmable-Blocks. ICDE 2015
• Mohammedreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Flexible Query Processor on FPGAs. PVLDB 6(12): 1310-1313 (2013)
• Mohammad Sadoghi, Rija Javed, NaifTarafdar, Harsh Singh, Rohan Palaniappan, Hans-Arno Jacobsen: Multi-query Stream Processing on FPGAs. ICDE 2012: 1229-1232
• Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen:Towards highly parallel event processing through reconfigurable hardware. DaMoN 2011: 27-32
• Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen: fpga-ToPSS: line-speed event processing on fpgas. DEBS 2011: 373-374
• Mohammad Sadoghi, Hans-Arno Jacobsen, Martin Labrecque,Warren Shum, Harsh Singh: Efficient Event Processing through Reconfigurable Hardware for AlgorithmicTrading. PVLDB 3(2):
1525-1528 (2010)
References:

Weitere ähnliche Inhalte

Ähnlich wie "ExpoDB: An Exploratory Data Science Platform"

Connecting antimalarial data
Connecting antimalarial dataConnecting antimalarial data
Connecting antimalarial dataChris Southan
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionPaul Groth
 
Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesMichel Dumontier
 
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...Warren Kibbe
 
PBSS sf 10-28-2016 flyer
PBSS sf 10-28-2016 flyerPBSS sf 10-28-2016 flyer
PBSS sf 10-28-2016 flyerVinita Gupta
 
Health 2.0 for UK SpRs
Health 2.0 for UK SpRsHealth 2.0 for UK SpRs
Health 2.0 for UK SpRsColin Mitchell
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...ExternalEvents
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-upopen_phacts
 
Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...
Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...
Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...Larry Smarr
 
DNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal GenomicsDNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal GenomicsMelanie Swan
 
Transparency solutions ema disclosure for slide share
Transparency solutions  ema disclosure for slide shareTransparency solutions  ema disclosure for slide share
Transparency solutions ema disclosure for slide shareStephen Allan Weitzman
 
Dekker trog - knowledge engineering in radiation oncology - 2017
Dekker   trog  - knowledge engineering in radiation oncology - 2017Dekker   trog  - knowledge engineering in radiation oncology - 2017
Dekker trog - knowledge engineering in radiation oncology - 2017Andre Dekker
 
Advancing Translational Research With The Semantic Web
Advancing Translational Research With The Semantic WebAdvancing Translational Research With The Semantic Web
Advancing Translational Research With The Semantic WebJanelle Martinez
 
acs talk open source drug discovery
acs talk open source drug discoveryacs talk open source drug discovery
acs talk open source drug discoverySean Ekins
 
Preparing for Microbial Threats to Health: What Every Professional Should Know
Preparing for Microbial Threats to Health: What Every Professional Should KnowPreparing for Microbial Threats to Health: What Every Professional Should Know
Preparing for Microbial Threats to Health: What Every Professional Should KnowTomas J. Aragon
 
Technologies disrupting healthcare (webinar)
Technologies disrupting healthcare (webinar)Technologies disrupting healthcare (webinar)
Technologies disrupting healthcare (webinar)Ashish Advani
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 

Ähnlich wie "ExpoDB: An Exploratory Data Science Platform" (20)

Connecting antimalarial data
Connecting antimalarial dataConnecting antimalarial data
Connecting antimalarial data
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tension
 
Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web Technologies
 
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
 
PBSS sf 10-28-2016 flyer
PBSS sf 10-28-2016 flyerPBSS sf 10-28-2016 flyer
PBSS sf 10-28-2016 flyer
 
Health 2.0 for UK SpRs
Health 2.0 for UK SpRsHealth 2.0 for UK SpRs
Health 2.0 for UK SpRs
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
 
Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...
Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...
Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...
 
DNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal GenomicsDNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal Genomics
 
Transparency solutions ema disclosure for slide share
Transparency solutions  ema disclosure for slide shareTransparency solutions  ema disclosure for slide share
Transparency solutions ema disclosure for slide share
 
PV 2016
PV 2016PV 2016
PV 2016
 
Dekker trog - knowledge engineering in radiation oncology - 2017
Dekker   trog  - knowledge engineering in radiation oncology - 2017Dekker   trog  - knowledge engineering in radiation oncology - 2017
Dekker trog - knowledge engineering in radiation oncology - 2017
 
Advancing Translational Research With The Semantic Web
Advancing Translational Research With The Semantic WebAdvancing Translational Research With The Semantic Web
Advancing Translational Research With The Semantic Web
 
acs talk open source drug discovery
acs talk open source drug discoveryacs talk open source drug discovery
acs talk open source drug discovery
 
Preparing for Microbial Threats to Health: What Every Professional Should Know
Preparing for Microbial Threats to Health: What Every Professional Should KnowPreparing for Microbial Threats to Health: What Every Professional Should Know
Preparing for Microbial Threats to Health: What Every Professional Should Know
 
Technologies disrupting healthcare (webinar)
Technologies disrupting healthcare (webinar)Technologies disrupting healthcare (webinar)
Technologies disrupting healthcare (webinar)
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Sequencing and Beyond?
Sequencing and Beyond?Sequencing and Beyond?
Sequencing and Beyond?
 

Mehr von diannepatricia

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsondiannepatricia
 
Cognitive systems institute talk 8 june 2017 - v.1.0
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0diannepatricia
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systemsdiannepatricia
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”diannepatricia
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibilitydiannepatricia
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Cardiannepatricia
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”diannepatricia
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...diannepatricia
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...diannepatricia
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”diannepatricia
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Agingdiannepatricia
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"diannepatricia
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligencediannepatricia
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systemsdiannepatricia
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”diannepatricia
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...diannepatricia
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50diannepatricia
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learningdiannepatricia
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Societydiannepatricia
 

Mehr von diannepatricia (20)

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watson
 
Cognitive systems institute talk 8 june 2017 - v.1.0
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systems
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibility
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Car
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Aging
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligence
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systems
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learning
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Society
 
Hicss17 asakawa
Hicss17 asakawaHicss17 asakawa
Hicss17 asakawa
 

Kürzlich hochgeladen

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

"ExpoDB: An Exploratory Data Science Platform"

  • 1. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB:An Exploratory Data Science Platform (A New Frontier: From Data Processing to Knowledge Exploration) Mohammad Sadoghi Assistant Professor Department of Computer Science Purdue University IBM Cognitive Systems Institute Speaker Series September 29, 2016
  • 2. © 2016 Mohammad Sadoghi (Purdue University) Insight is Lost in Islands of Data 2 http://www.cpsresearch.eu/clinical-trials/ http://news.mit.edu/2015/mnookin-vaccination-public-health-0227 http://www.healthcarepackaging.com/trends-and-issues/clinical-trials http://stormercellularloo.gq/evolve-ii-clinical-trial.html https://www.geneticliteracyproject.org Data is spread across many islands of disconnected sources (a lack of holistic view)
  • 3. © 2016 Mohammad Sadoghi (Purdue University) Insight is Lost in Islands of Data 3 http://www.cpsresearch.eu/clinical-trials/ http://news.mit.edu/2015/mnookin-vaccination-public-health-0227 http://www.healthcarepackaging.com/trends-and-issues/clinical-trials http://stormercellularloo.gq/evolve-ii-clinical-trial.html https://www.geneticliteracyproject.org Sadly, adverse drug reactions (ADRs) is the 4th leading cause of deaths in United States, resulting in100,000 loss of life annually
  • 4. © 2016 Mohammad Sadoghi (Purdue University) Insight is Lost in Islands of Data 4 http://www.cpsresearch.eu/clinical-trials/ http://news.mit.edu/2015/mnookin-vaccination-public-health-0227 http://www.healthcarepackaging.com/trends-and-issues/clinical-trials http://stormercellularloo.gq/evolve-ii-clinical-trial.html https://www.geneticliteracyproject.org Adverse drug reaction costs over $136 billion dollars in US annually
  • 5. © 2016 Mohammad Sadoghi (Purdue University) Real-time Fusion and Exploration of Data
  • 6. © 2016 Mohammad Sadoghi (Purdue University) Real-time Fusion and Exploration of Enriched Data
  • 7. © 2016 Mohammad Sadoghi (Purdue University) Real-time Fusion and Exploration of Enriched Data at Web Scale
  • 8. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 8 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits limit cells growth tum or suppressor Why capture the semantic/context? Semantic is essential to connect the dots.
  • 9. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 9 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits limit cells growth tum or suppressor Why capture the semantic/context? Semantic is essential to connect the dots.
  • 10. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 10 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits limit cells growth Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits tum or suppressor Why capture the semantic/context? Semantic is essential to connect the dots.
  • 11. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 11 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits limit cells growth Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits tum or suppressor ? Why capture the semantic/context? Semantic is essential to connect the dots.
  • 12. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 12 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) tum or suppressor Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits limit cells growth ? ? ? Why capture the semantic/context? Semantic is essential to connect the dots.
  • 13. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 13 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits (1) Instance Layer: Capturing raw data instances including both structured & semi-structured data How to capture the context? limit cells growth tum or suppressor
  • 14. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 14 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits How to capture the context? limit cells growth tum or suppressor (2) Relation Layer: Capturing the interconnectedness of data instances across data sources
  • 15. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 15 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits How to capture the context? limit cells growth tum or suppressor (3) Semantic Layer: Capturing conceptual relationships among data instances and their types
  • 16. © 2016 Mohammad Sadoghi (Purdue University) Enriched Data Model: Semantic is essential to connect the dots 16 PTGS2 (Gene) TP53 (Gene) Acetaminophen (Tylenol) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Relief Fever Ibuprofen (Advil) Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Drug Name Drug Targets (Genes) Symptomatic Treatment Ibuprofen PTGS2 Rheumatoid Arthritis Acetaminophen PTGS2 Relief Fever Methotrexate DHFR Antineoplastic Anti-metabolite Warfarin TP53 Embolism (Blood Clot) Gene Interaction PTGS2 TP53 (Gene) DrugBank: Bioinformatics & Cheminformatics Resource CTD: Comparative Toxicogenomics Database Gene Function TP53 Tumor Suppressor DHFR Limits Cell Growth Uniprot: Universal Protein Resource Gene Disease TP53 Osteosarcoma SemanticlayerRelationlayerInstancelayer Methotrexate DHFR (Gene) Arthritis Warfarin Embolism (Blood Clot) InformationKnowledgeData Warfarin has narrow therapeutic range (fatal outcomes) Dosage for Asians population: 3.4 mg Dosage for Whites population: 5.1mg Dosage for African-Americans population: 6.1 mg
  • 17. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 17 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No
  • 18. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 18 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?”
  • 19. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 19 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?”
  • 20. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 20 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?” “What are the disjoint classes of population with respect to Warfarin?”
  • 21. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 21 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?” “What are the disjoint classes of population with respect to Warfarin?” “What are the adverse reactions of Warfarin?”
  • 22. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 22 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?” “What are the disjoint classes of population with respect to Warfarin?” “What are the adverse reactions of Warfarin?” “What is an effective dosage of Warfarin for preventing blood clot?”
  • 23. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 23 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?”
  • 24. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 24 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?” Dosage for African-Americans population: 6.1 mg Dosage for Whites population: 5.1mg Dosage for Asians population: 3.4 mg
  • 25. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 25 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” Querying different sources return 6.1 mg, 5.1 mg, & 3.4 mg, so is the data inconsistent? (revisiting consistent answers formalism & possible world semantics) “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?” Dosage for African-Americans population: 6.1 mg Dosage for Whites population: 5.1mg Dosage for Asians population: 3.4 mg
  • 26. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 26 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” Querying different sources return 6.1 mg, 5.1 mg, & 3.4 mg, so is the data inconsistent? (revisiting consistent answers formalism & possible world semantics) “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?” Dosage for African-Americans population: 6.1 mg Dosage for Whites population: 5.1mg Dosage for Asians population: 3.4 mg Given the known narrow therapeutic range, so is 5.1 mg close enough to 5.0 mg? (fuzzy answers formalism in presence of enriched data)
  • 27. © 2016 Mohammad Sadoghi (Purdue University) Spark Architecture: Knowledge Oblivious Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Immutable Collection of Objects) Storage Resource Virtualization 27 Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Apache Spark (General Data Processing on Distributed Memory) Spark Data Model (Resilient Distributed Datasets — RDDs) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Personalized Medicine (Drug Discovery/Safety) Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Computational Finance Compliance Informatics
  • 28. © 2016 Mohammad Sadoghi (Purdue University) Spark Architecture: Knowledge Oblivious Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Immutable Collection of Objects) Storage Resource Virtualization 28 Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Apache Spark (General Data Processing on Distributed Memory) Spark Data Model (Resilient Distributed Datasets — RDDs) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 29. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 29 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 30. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 30 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 31. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 31 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Semantic Layer Ontology Rules Stochastic Models Tensor Embedding Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 32. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 32 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 33. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 33 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib ReasoningRefinementCuration Fusion Discovery Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP) Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 34. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture:Active Data Path Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 34 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib ReasoningRefinementCuration Fusion Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Discovery Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Virtualized Hardware Acceleration (GPU & FPGA) Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 35. © 2016 Mohammad Sadoghi (Purdue University) Personalized Medicine (Drug Discovery/Safety) Computational Finance The First Step! Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 35 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib ReasoningRefinementCuration Fusion Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Discovery Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP) L-Store (Real-time OLTP+OLAP) FQP (Flexible Query Processor) EmbedS (Ontology) Phenomenological Features (Deep-Learning-as-Oracle) PADRES (Event Processing) IBM DB2 BLU (Column Store) SPIDER (Declarative Data Cleansing) Vraph (Vectorized Graph Processing) Tiresias (Predicting Adverse Drug Reaction) fpga-ToPSS (Algorithmic Trading) Compliance Informatics Virtualized Hardware Acceleration (GPU & FPGA)
  • 36. © 2016 Mohammad Sadoghi (Purdue University) ThankYou Q&A Exploratory Systems Lab (ExpoLab) website: https://msadoghi.github.io/
  • 37. © 2016 Mohammad Sadoghi (Purdue University) Data/Knowledge Exploration: • Mohammad Sadoghi, Kavitha Srinivas, Oktie Hassanzadeh,Yuan-Chi Chang, Mustafa Canim,Achille Fokoue,Yishai A. Feldman: Self-Curating Databases. EDBT 2016 • Amit Chandel, Oktie Hassanzadeh, Nick Koudas, Mohammad Sadoghi, Divesh Srivastava: Benchmarking declarative approximate selection predicates. SIGMOD Conference 2007: 353-364 • Oktie Hassanzadeh, Mohammad Sadoghi, Renée J. Miller:Accuracy of Approximate String Joins Using Grams. QDB 2007 Drug Safety: • Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, Ping Zhang: Predicting Drug-Drug InteractionsThrough Large-Scale Similarity-Based Link Prediction. ESWC 2016 • Achille Fokoue, Oktie Hassanzadeh, Mohammad Sadoghi, Ping Zhang: Predicting Drug-Drug InteractionsThrough Similarity-Based Link Prediction OverWeb Data.WWW 2016 OLTP & OLAP: • Mohammad Sadoghi, Souvik Bhattacherjee, Bishwaranjan Bhattacharjee, Mustafa Canim: L-Store:A Real-time OLTP and OLAP System. CoRR abs/1601.04084 (2016) • Kaiwen Zhang, Mohammad Sadoghi, Hans-Arno Jacobsen: DL-Store:A Distributed Hybrid OLTP and OLAP Data Processing Engine. ICDCS 2016 • Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Exploiting SSDs in operational multiversion databases.VLDB J. 25(5): 651-672 (2016) • Mohammad Sadoghi, Mustafa Canim, Bishwaranjan Bhattacharjee, Fabian Nagel, Kenneth A. Ross: Reducing Database Locking ContentionThrough Multi-version Concurrency. PVLDB 7(13): 1331-1342 (2014) • Prashanth Menon,Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: CaSSanDra:An SSD boosted key-value store. ICDE 2014: 1162-1167 • Prashanth Menon,Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: Optimizing key-value stores for hybrid storage architectures. CASCON 2014: 355-358 • Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Making Updates Disk-I/O Friendly Using SSDs. PVLDB 6(11): 997-1008 (2013) Hardware Acceleration: • Rajesh R. Bordawekar, Mohammad Sadoghi:Accelerating database workloads by software-hardware-system co-design. ICDE 2016 • Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: SplitJoin:A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision. USENIX AnnualTechnical Conference 2016 • Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen:The FQPVision: Flexible Query Processing on a Reconfigurable Computing Fabric. SIGMOD Record 44(2): 5-10 (2015) • Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Configurable hardware-based streaming architecture using Online Programmable-Blocks. ICDE 2015 • Mohammedreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Flexible Query Processor on FPGAs. PVLDB 6(12): 1310-1313 (2013) • Mohammad Sadoghi, Rija Javed, NaifTarafdar, Harsh Singh, Rohan Palaniappan, Hans-Arno Jacobsen: Multi-query Stream Processing on FPGAs. ICDE 2012: 1229-1232 • Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen:Towards highly parallel event processing through reconfigurable hardware. DaMoN 2011: 27-32 • Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen: fpga-ToPSS: line-speed event processing on fpgas. DEBS 2011: 373-374 • Mohammad Sadoghi, Hans-Arno Jacobsen, Martin Labrecque,Warren Shum, Harsh Singh: Efficient Event Processing through Reconfigurable Hardware for AlgorithmicTrading. PVLDB 3(2): 1525-1528 (2010) References: