SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Graphium Chrysalis: Exploiting
Graph Database
Engines to Analyze RDF Graphs
Alejandro Flores
Maria-Esther Vidal
Guillermo Palma
Universidad Simón Bolívar
1Graph-TA 2015
Agenda
 Motivation
 Graphium
 Graph Invariants in Graphium
Graph-TA 2015 2
Resource Description Framework (RDF) Model
3
Subject Object
Predicate
Resource Description Framework (RDF) Model
4
duration
duration
Properties and Relationships are represented as predicates
The Beatles
Let it be
Revolver
Help!
created
1970
35:16
1965
year
1966
35:01
Liverpool
thebeatles.com
Subject Object
Predicate
Source: “Scaling Up Linked Data”.
EUCLID project.
Semantic Data Management
RDF Graphs
RDF Engines
S P O S OP PSO
POS OSP OPS
SPARQL queries
that represent
Graph patterns
Property Graph Model
6
 Nodes and edges may have properties
 Properties: Key-value pairs
The Beatles
Let it be
Revolver
Help!
created
Year: 1970
Duration: 35:16
Year: 1965
Year: 1966
Duration: 35:01
Homepage:
thebeatles.com
Origin: Liverpool
Source: “Scaling Up Linked Data”.
EUCLID project.
Semantic Data Management
Property
Graphs
Graph Database
Engines
Edges &
Nodes
Neighborhoods
Graph-based
tasks
Semantic Data Management
RDF Graphs
RDF Engines
S P O S OP PSO
POS OSP OPS
Property
Graphs
Graph Database
Engines
SPARQL queries
that represent
Graph patterns
Edges &
Nodes
Neighborhoods
Graph-based
tasks
9
Benchmark of Graph
Graph Name #Nodes #Edges Density #Labels
DSJC1000.1
[Johnson91]
1,000 99,258 0.099 1
DSJC1000.5
[Johnson91]
1,000 499,652 0.50 1
DSJC1000.9
[Johnson91]
1,000 898,898 0.899 1
USA-road-
d.NY
264,346 730,100 0.00001045 7,970
USA-road-
d.FLA
1,070,376 2,687,902 0.00000235 22,704
Berlin10M 2,743,235 9,709,119 0.00000129 40
[Johnson91] Johnson, D., Aragon, C., McGeoch, L., and Schevon, C. Optimization by simulated annealing: an experimental
evaluation; part ii, graph coloring and number partitioning. Operations research 39, 3 (1991), 378–406.
USA-road-d* Graphs 9th DIMACS Implementation Challenge - Shortest Paths http://www.dis.uniroma1.it/challenge9/download.shtml
Berlin10M: Berlin Bechmark-http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/
COLD 2013
Adjacency Tests
10
Triple Pattern based Tests
K-Hop Tests
11
Semantic Data Management
RDF Graphs
RDF Engines
S P O S OP PSO
POS OSP OPS
Property
Graphs
Graph Database
Engines
SPARQL queries
that represent
Graph patterns
Edges &
Nodes
Neighborhoods
Graph-based
tasks
13
GRAPHIUM
Neo4j Sparksee
Graph-based API RDF-based API
GRAPHIUM: http://graphium.ldc.usb.ve
http://graphium.ldc.usb.ve/
14
15
GRAPHIUM
Neo4j Sparksee
Graph-based API RDF-based API
Data Mining Traversal API
Graph
Invariants
GRAPHIUM: http://graphium.ldc.usb.ve
16
Graph Invariants
17
Invariant Description
Vertex and Edge Count number of vertices and edges in the graph.
Graph Density number of edges in the graph divided by the number
of possible edges in a complete digraph.
Reciprocity Reciprocity measures the extend to which a triple that
relates resources A and B is reciprocated by a another
triple that relates B with A too.
In- and Out-degree Distribution Distribution of the number of in-coming and out-going
edges of the vertices of a graph.
In-coming and Out-going H-index h is the maximum number, such that h vertices have
each at least h in-coming neighbors (resp., out-going
neighbors) in the graph.
Reciprocity: Reciprocal edges indicates stronger relationships between vertices.
Graph invariants
diseasome:possibleDrug
Drugbank Diseasome
drugbank:possibleDiseaseTarget
Reciprocity: Reciprocal edges indicates stronger relationships between vertices.
Graph invariants
Drugbank Diseasome
diseasome:possibleDrug
drugbank:possibleDiseaseTarget
drugbank:DB00157
drugbank:possibleDiseaseTarget
diseasome:diseases/0
diseasome:diseases/1
diseasome:diseases/4198
…
diseasome:diseases/0
diseasome:possibleDrug
drugbank:DB00157
diseasome:diseases/1
drugbank:DB00157
Reciprocity values less than 1.0 indicates that there are drugs associated with
diseases that do not have their reciprocal link.
Reciprocity: Reciprocal edges indicates stronger relationships between vertices.
Graph invariants
Drugbank Diseasome
diseasome:possibleDrug
drugbank:possibleDiseaseTarget
drugbank:DB00157
drugbank:possibleDiseaseTarget
diseasome:diseases/0
diseasome:diseases/1
diseasome:diseases/4198
…
diseasome:diseases/0
diseasome:possibleDrug
drugbank:DB00157
diseasome:diseases/1
drugbank:DB00157
Reciprocity can be used to determine Data Quality and Completeness
H-Index Sets
H-Index Set Out
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where H is the maximum
number, such that the vertices in F have
each at least H out-going neighbors.
S5
H-Index Set Out
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where 2 is the maximum
number, such that the vertices in F have
each at least 2 out-going neighbors.
S5
F={S1,S2,S3}
3
3
2
H-Index Set In
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where H is the maximum
number, such that the vertices in F have
each at least H in-coming neighbors.
S5
H-Index Set Out
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where 3 is the maximum
number, such that the vertices in F have
each at least 3 in-coming neighbors.
S5
F={O1,O2,O3}
3
3
3
Graph invariants
SELECT DISTINCT *
WHERE {
?s drugbank:drugCategory <http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugcategory/micronutrient>.
?s drugbank:target ?o.
?o drugbank:drugReference ?o2.
?o drugbank:goClassificationComponent ?o3
}
Drugbank SPARQL endpoint times out
“References and GO annotations of the targets associated with the Micro Nutrient Drugs”
48 Drugs
685 Targets
…
…
…
…
…
…
…
…
…
…
References
GO Terms
2-hop of Micro Nutrient Drugs
Graph invariants
48 Drugs
685 Targets
…
…
…
…
…
…
…
…
…
…
References
GO Terms
Graph invariants
48 Drugs
685 Targets
…
…
…
…
…
…
…
…
…
…
References
GO Terms
10 Drugs have at least
57 out-going links
H-Index Out
48 Drugs
685 Targets
…
…
…
…
…
…
…
…
…
…
References
GO Terms
47 Targets have at least
57 out-going links
H-Index Out
48 Drugs
685 Targets
…
…
…
…
…
…
…
…
…
…
References
GO Terms
6 References have
at least 21 in-coming
links
H-Index In
H-Index Sets can be used to explain query complexity
H-Index Sets to Validate Potential
Novel Associations
H-Index Sets
Network of Targets and Drugs
Targets Drugs
H-Index Sets
34
A set F of targets, where
H is the maximum
number, such that the
targets in F have each at
least H out-going
neighbors.
Targets
H-Index Sets
35
A set F of targets, where
H is the maximum
number, such that the
targets in F have each at
least H out-going
neighbors.
A set F of drugs, where H
is the maximum number,
such that the drugs in F
have each at least H in-
coming neighbors.
Targets
Drugs
Set of Targets and Drugs
 900 Drugs, 1,000 Targets and 5,000
Interactions: Nuclear receptor, Gprotein-
coupled receptors (GPCRs), Ion channels, and
Enzymes.
 DrugBank
K. Bleakley and Y. Yamanishi. Supervised prediction of drug target interactions using bipartite local
models. Bioinformatics, 25(18).2009.
36
GPCR
Drugs 223
Targets 95
Interactions 635
Avg Interaction
per Target
6.68
Avg Interaction
per Drug
2.84
Drugbank Drugs in the dataset of
Gprotein-coupled receptors (GPCRs)
H-index Out is 14
15 Targets are in the H-Index Set Out
F={hsa:1128, hsa:1129, hsa:146, hsa:147, hsa:148, hsa:150, hsa:151,hsa:152,hsa:153,hsa:154,hsa:155,hsa:1812, hsa:1813, has:3269,has:3356}
Drugbank Drugs in the dataset of
Gprotein-coupled receptors (GPCRs)
H-index Out is 14
15 Targets are in the H-Index Set Out
H-Index Sets
D02076 hsa:146
D02076 hsa:147
D00604 has:147
Belong to the H-index Set
Associations between Drugs and Targets that are not in Drugbank
Validated in STICTH http://stitch.embl.de/
H-Index Sets
D02076 hsa:146
D02076 hsa:147
D00604 has:147
Belong to the H-index Set
Associations between Drugs and Targets that are not in Drugbank
Validated in STICTH http://stitch.embl.de/
H-Index Sets can be used to
Validate the Discovered
Associations
H-Index Sets
Visit our website:
http://graphium.ldc.usb.ve/
Conclusions
Graph Invariants:
 Remain the same under two
isomorphic graphs and any
representation.
 Allow for uncovering hidden properties
of the graphs
Reciprocity
Density
H-Index Set
Reciprocity can suggest data
quality and incompleteness.
Density can be used to explain
complexity of graph tasks
H-index sets can comprise
entities useful to discover potential
novel associations.

Weitere ähnliche Inhalte

Mehr von Graph-TA

Using Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationUsing Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationGraph-TA
 
Reactive Databases for Big Data applications
Reactive Databases for Big Data applicationsReactive Databases for Big Data applications
Reactive Databases for Big Data applicationsGraph-TA
 
The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...Graph-TA
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITGraph-TA
 
Identifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksIdentifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksGraph-TA
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotGraph-TA
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataGraph-TA
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingGraph-TA
 
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsUse of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsGraph-TA
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraph-TA
 
Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphGraph-TA
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsGraph-TA
 
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGraph-TA
 
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsOn the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsGraph-TA
 
Graphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platformsGraphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platformsGraph-TA
 
Autograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolAutograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolGraph-TA
 
Understanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesUnderstanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesGraph-TA
 
Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Graph-TA
 
Recent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataRecent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataGraph-TA
 
Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...Graph-TA
 

Mehr von Graph-TA (20)

Using Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationUsing Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generation
 
Reactive Databases for Big Data applications
Reactive Databases for Big Data applicationsReactive Databases for Big Data applications
Reactive Databases for Big Data applications
 
The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBIT
 
Identifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksIdentifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual Networks
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked Data
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modeling
 
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsUse of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platforms
 
Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graph
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
 
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
 
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsOn the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
 
Graphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platformsGraphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platforms
 
Autograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolAutograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph tool
 
Understanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesUnderstanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge Bases
 
Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...
 
Recent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataRecent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal Data
 
Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...
 

Kürzlich hochgeladen

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Graphium Chrysalis: Exploiting Graph Database

  • 1. Graphium Chrysalis: Exploiting Graph Database Engines to Analyze RDF Graphs Alejandro Flores Maria-Esther Vidal Guillermo Palma Universidad Simón Bolívar 1Graph-TA 2015
  • 2. Agenda  Motivation  Graphium  Graph Invariants in Graphium Graph-TA 2015 2
  • 3. Resource Description Framework (RDF) Model 3 Subject Object Predicate
  • 4. Resource Description Framework (RDF) Model 4 duration duration Properties and Relationships are represented as predicates The Beatles Let it be Revolver Help! created 1970 35:16 1965 year 1966 35:01 Liverpool thebeatles.com Subject Object Predicate Source: “Scaling Up Linked Data”. EUCLID project.
  • 5. Semantic Data Management RDF Graphs RDF Engines S P O S OP PSO POS OSP OPS SPARQL queries that represent Graph patterns
  • 6. Property Graph Model 6  Nodes and edges may have properties  Properties: Key-value pairs The Beatles Let it be Revolver Help! created Year: 1970 Duration: 35:16 Year: 1965 Year: 1966 Duration: 35:01 Homepage: thebeatles.com Origin: Liverpool Source: “Scaling Up Linked Data”. EUCLID project.
  • 7. Semantic Data Management Property Graphs Graph Database Engines Edges & Nodes Neighborhoods Graph-based tasks
  • 8. Semantic Data Management RDF Graphs RDF Engines S P O S OP PSO POS OSP OPS Property Graphs Graph Database Engines SPARQL queries that represent Graph patterns Edges & Nodes Neighborhoods Graph-based tasks
  • 9. 9 Benchmark of Graph Graph Name #Nodes #Edges Density #Labels DSJC1000.1 [Johnson91] 1,000 99,258 0.099 1 DSJC1000.5 [Johnson91] 1,000 499,652 0.50 1 DSJC1000.9 [Johnson91] 1,000 898,898 0.899 1 USA-road- d.NY 264,346 730,100 0.00001045 7,970 USA-road- d.FLA 1,070,376 2,687,902 0.00000235 22,704 Berlin10M 2,743,235 9,709,119 0.00000129 40 [Johnson91] Johnson, D., Aragon, C., McGeoch, L., and Schevon, C. Optimization by simulated annealing: an experimental evaluation; part ii, graph coloring and number partitioning. Operations research 39, 3 (1991), 378–406. USA-road-d* Graphs 9th DIMACS Implementation Challenge - Shortest Paths http://www.dis.uniroma1.it/challenge9/download.shtml Berlin10M: Berlin Bechmark-http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/ COLD 2013
  • 12. Semantic Data Management RDF Graphs RDF Engines S P O S OP PSO POS OSP OPS Property Graphs Graph Database Engines SPARQL queries that represent Graph patterns Edges & Nodes Neighborhoods Graph-based tasks
  • 13. 13 GRAPHIUM Neo4j Sparksee Graph-based API RDF-based API GRAPHIUM: http://graphium.ldc.usb.ve
  • 15. 15 GRAPHIUM Neo4j Sparksee Graph-based API RDF-based API Data Mining Traversal API Graph Invariants GRAPHIUM: http://graphium.ldc.usb.ve
  • 16. 16
  • 17. Graph Invariants 17 Invariant Description Vertex and Edge Count number of vertices and edges in the graph. Graph Density number of edges in the graph divided by the number of possible edges in a complete digraph. Reciprocity Reciprocity measures the extend to which a triple that relates resources A and B is reciprocated by a another triple that relates B with A too. In- and Out-degree Distribution Distribution of the number of in-coming and out-going edges of the vertices of a graph. In-coming and Out-going H-index h is the maximum number, such that h vertices have each at least h in-coming neighbors (resp., out-going neighbors) in the graph.
  • 18. Reciprocity: Reciprocal edges indicates stronger relationships between vertices. Graph invariants diseasome:possibleDrug Drugbank Diseasome drugbank:possibleDiseaseTarget
  • 19. Reciprocity: Reciprocal edges indicates stronger relationships between vertices. Graph invariants Drugbank Diseasome diseasome:possibleDrug drugbank:possibleDiseaseTarget drugbank:DB00157 drugbank:possibleDiseaseTarget diseasome:diseases/0 diseasome:diseases/1 diseasome:diseases/4198 … diseasome:diseases/0 diseasome:possibleDrug drugbank:DB00157 diseasome:diseases/1 drugbank:DB00157 Reciprocity values less than 1.0 indicates that there are drugs associated with diseases that do not have their reciprocal link.
  • 20. Reciprocity: Reciprocal edges indicates stronger relationships between vertices. Graph invariants Drugbank Diseasome diseasome:possibleDrug drugbank:possibleDiseaseTarget drugbank:DB00157 drugbank:possibleDiseaseTarget diseasome:diseases/0 diseasome:diseases/1 diseasome:diseases/4198 … diseasome:diseases/0 diseasome:possibleDrug drugbank:DB00157 diseasome:diseases/1 drugbank:DB00157 Reciprocity can be used to determine Data Quality and Completeness
  • 22. H-Index Set Out S1 O1 P1 S2 O2 P2 S3 O3 P3 P4 P5 S4 P6 O4 P7 P8 A set F of vertices, where H is the maximum number, such that the vertices in F have each at least H out-going neighbors. S5
  • 23. H-Index Set Out S1 O1 P1 S2 O2 P2 S3 O3 P3 P4 P5 S4 P6 O4 P7 P8 A set F of vertices, where 2 is the maximum number, such that the vertices in F have each at least 2 out-going neighbors. S5 F={S1,S2,S3} 3 3 2
  • 24. H-Index Set In S1 O1 P1 S2 O2 P2 S3 O3 P3 P4 P5 S4 P6 O4 P7 P8 A set F of vertices, where H is the maximum number, such that the vertices in F have each at least H in-coming neighbors. S5
  • 25. H-Index Set Out S1 O1 P1 S2 O2 P2 S3 O3 P3 P4 P5 S4 P6 O4 P7 P8 A set F of vertices, where 3 is the maximum number, such that the vertices in F have each at least 3 in-coming neighbors. S5 F={O1,O2,O3} 3 3 3
  • 26. Graph invariants SELECT DISTINCT * WHERE { ?s drugbank:drugCategory <http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugcategory/micronutrient>. ?s drugbank:target ?o. ?o drugbank:drugReference ?o2. ?o drugbank:goClassificationComponent ?o3 } Drugbank SPARQL endpoint times out “References and GO annotations of the targets associated with the Micro Nutrient Drugs”
  • 27. 48 Drugs 685 Targets … … … … … … … … … … References GO Terms 2-hop of Micro Nutrient Drugs Graph invariants
  • 29. 48 Drugs 685 Targets … … … … … … … … … … References GO Terms 10 Drugs have at least 57 out-going links H-Index Out
  • 30. 48 Drugs 685 Targets … … … … … … … … … … References GO Terms 47 Targets have at least 57 out-going links H-Index Out
  • 31. 48 Drugs 685 Targets … … … … … … … … … … References GO Terms 6 References have at least 21 in-coming links H-Index In H-Index Sets can be used to explain query complexity
  • 32. H-Index Sets to Validate Potential Novel Associations
  • 33. H-Index Sets Network of Targets and Drugs Targets Drugs
  • 34. H-Index Sets 34 A set F of targets, where H is the maximum number, such that the targets in F have each at least H out-going neighbors. Targets
  • 35. H-Index Sets 35 A set F of targets, where H is the maximum number, such that the targets in F have each at least H out-going neighbors. A set F of drugs, where H is the maximum number, such that the drugs in F have each at least H in- coming neighbors. Targets Drugs
  • 36. Set of Targets and Drugs  900 Drugs, 1,000 Targets and 5,000 Interactions: Nuclear receptor, Gprotein- coupled receptors (GPCRs), Ion channels, and Enzymes.  DrugBank K. Bleakley and Y. Yamanishi. Supervised prediction of drug target interactions using bipartite local models. Bioinformatics, 25(18).2009. 36 GPCR Drugs 223 Targets 95 Interactions 635 Avg Interaction per Target 6.68 Avg Interaction per Drug 2.84
  • 37. Drugbank Drugs in the dataset of Gprotein-coupled receptors (GPCRs) H-index Out is 14 15 Targets are in the H-Index Set Out F={hsa:1128, hsa:1129, hsa:146, hsa:147, hsa:148, hsa:150, hsa:151,hsa:152,hsa:153,hsa:154,hsa:155,hsa:1812, hsa:1813, has:3269,has:3356}
  • 38. Drugbank Drugs in the dataset of Gprotein-coupled receptors (GPCRs) H-index Out is 14 15 Targets are in the H-Index Set Out
  • 40. D02076 hsa:146 D02076 hsa:147 D00604 has:147 Belong to the H-index Set Associations between Drugs and Targets that are not in Drugbank Validated in STICTH http://stitch.embl.de/ H-Index Sets
  • 41. D02076 hsa:146 D02076 hsa:147 D00604 has:147 Belong to the H-index Set Associations between Drugs and Targets that are not in Drugbank Validated in STICTH http://stitch.embl.de/ H-Index Sets can be used to Validate the Discovered Associations H-Index Sets
  • 43. Conclusions Graph Invariants:  Remain the same under two isomorphic graphs and any representation.  Allow for uncovering hidden properties of the graphs Reciprocity Density H-Index Set Reciprocity can suggest data quality and incompleteness. Density can be used to explain complexity of graph tasks H-index sets can comprise entities useful to discover potential novel associations.