4. Resource Description Framework (RDF) Model
4
duration
duration
Properties and Relationships are represented as predicates
The Beatles
Let it be
Revolver
Help!
created
1970
35:16
1965
year
1966
35:01
Liverpool
thebeatles.com
Subject Object
Predicate
Source: “Scaling Up Linked Data”.
EUCLID project.
5. Semantic Data Management
RDF Graphs
RDF Engines
S P O S OP PSO
POS OSP OPS
SPARQL queries
that represent
Graph patterns
6. Property Graph Model
6
Nodes and edges may have properties
Properties: Key-value pairs
The Beatles
Let it be
Revolver
Help!
created
Year: 1970
Duration: 35:16
Year: 1965
Year: 1966
Duration: 35:01
Homepage:
thebeatles.com
Origin: Liverpool
Source: “Scaling Up Linked Data”.
EUCLID project.
8. Semantic Data Management
RDF Graphs
RDF Engines
S P O S OP PSO
POS OSP OPS
Property
Graphs
Graph Database
Engines
SPARQL queries
that represent
Graph patterns
Edges &
Nodes
Neighborhoods
Graph-based
tasks
9. 9
Benchmark of Graph
Graph Name #Nodes #Edges Density #Labels
DSJC1000.1
[Johnson91]
1,000 99,258 0.099 1
DSJC1000.5
[Johnson91]
1,000 499,652 0.50 1
DSJC1000.9
[Johnson91]
1,000 898,898 0.899 1
USA-road-
d.NY
264,346 730,100 0.00001045 7,970
USA-road-
d.FLA
1,070,376 2,687,902 0.00000235 22,704
Berlin10M 2,743,235 9,709,119 0.00000129 40
[Johnson91] Johnson, D., Aragon, C., McGeoch, L., and Schevon, C. Optimization by simulated annealing: an experimental
evaluation; part ii, graph coloring and number partitioning. Operations research 39, 3 (1991), 378–406.
USA-road-d* Graphs 9th DIMACS Implementation Challenge - Shortest Paths http://www.dis.uniroma1.it/challenge9/download.shtml
Berlin10M: Berlin Bechmark-http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/
COLD 2013
12. Semantic Data Management
RDF Graphs
RDF Engines
S P O S OP PSO
POS OSP OPS
Property
Graphs
Graph Database
Engines
SPARQL queries
that represent
Graph patterns
Edges &
Nodes
Neighborhoods
Graph-based
tasks
17. Graph Invariants
17
Invariant Description
Vertex and Edge Count number of vertices and edges in the graph.
Graph Density number of edges in the graph divided by the number
of possible edges in a complete digraph.
Reciprocity Reciprocity measures the extend to which a triple that
relates resources A and B is reciprocated by a another
triple that relates B with A too.
In- and Out-degree Distribution Distribution of the number of in-coming and out-going
edges of the vertices of a graph.
In-coming and Out-going H-index h is the maximum number, such that h vertices have
each at least h in-coming neighbors (resp., out-going
neighbors) in the graph.
19. Reciprocity: Reciprocal edges indicates stronger relationships between vertices.
Graph invariants
Drugbank Diseasome
diseasome:possibleDrug
drugbank:possibleDiseaseTarget
drugbank:DB00157
drugbank:possibleDiseaseTarget
diseasome:diseases/0
diseasome:diseases/1
diseasome:diseases/4198
…
diseasome:diseases/0
diseasome:possibleDrug
drugbank:DB00157
diseasome:diseases/1
drugbank:DB00157
Reciprocity values less than 1.0 indicates that there are drugs associated with
diseases that do not have their reciprocal link.
20. Reciprocity: Reciprocal edges indicates stronger relationships between vertices.
Graph invariants
Drugbank Diseasome
diseasome:possibleDrug
drugbank:possibleDiseaseTarget
drugbank:DB00157
drugbank:possibleDiseaseTarget
diseasome:diseases/0
diseasome:diseases/1
diseasome:diseases/4198
…
diseasome:diseases/0
diseasome:possibleDrug
drugbank:DB00157
diseasome:diseases/1
drugbank:DB00157
Reciprocity can be used to determine Data Quality and Completeness
22. H-Index Set Out
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where H is the maximum
number, such that the vertices in F have
each at least H out-going neighbors.
S5
23. H-Index Set Out
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where 2 is the maximum
number, such that the vertices in F have
each at least 2 out-going neighbors.
S5
F={S1,S2,S3}
3
3
2
24. H-Index Set In
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where H is the maximum
number, such that the vertices in F have
each at least H in-coming neighbors.
S5
25. H-Index Set Out
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where 3 is the maximum
number, such that the vertices in F have
each at least 3 in-coming neighbors.
S5
F={O1,O2,O3}
3
3
3
26. Graph invariants
SELECT DISTINCT *
WHERE {
?s drugbank:drugCategory <http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugcategory/micronutrient>.
?s drugbank:target ?o.
?o drugbank:drugReference ?o2.
?o drugbank:goClassificationComponent ?o3
}
Drugbank SPARQL endpoint times out
“References and GO annotations of the targets associated with the Micro Nutrient Drugs”
34. H-Index Sets
34
A set F of targets, where
H is the maximum
number, such that the
targets in F have each at
least H out-going
neighbors.
Targets
35. H-Index Sets
35
A set F of targets, where
H is the maximum
number, such that the
targets in F have each at
least H out-going
neighbors.
A set F of drugs, where H
is the maximum number,
such that the drugs in F
have each at least H in-
coming neighbors.
Targets
Drugs
36. Set of Targets and Drugs
900 Drugs, 1,000 Targets and 5,000
Interactions: Nuclear receptor, Gprotein-
coupled receptors (GPCRs), Ion channels, and
Enzymes.
DrugBank
K. Bleakley and Y. Yamanishi. Supervised prediction of drug target interactions using bipartite local
models. Bioinformatics, 25(18).2009.
36
GPCR
Drugs 223
Targets 95
Interactions 635
Avg Interaction
per Target
6.68
Avg Interaction
per Drug
2.84
37. Drugbank Drugs in the dataset of
Gprotein-coupled receptors (GPCRs)
H-index Out is 14
15 Targets are in the H-Index Set Out
F={hsa:1128, hsa:1129, hsa:146, hsa:147, hsa:148, hsa:150, hsa:151,hsa:152,hsa:153,hsa:154,hsa:155,hsa:1812, hsa:1813, has:3269,has:3356}
38. Drugbank Drugs in the dataset of
Gprotein-coupled receptors (GPCRs)
H-index Out is 14
15 Targets are in the H-Index Set Out
40. D02076 hsa:146
D02076 hsa:147
D00604 has:147
Belong to the H-index Set
Associations between Drugs and Targets that are not in Drugbank
Validated in STICTH http://stitch.embl.de/
H-Index Sets
41. D02076 hsa:146
D02076 hsa:147
D00604 has:147
Belong to the H-index Set
Associations between Drugs and Targets that are not in Drugbank
Validated in STICTH http://stitch.embl.de/
H-Index Sets can be used to
Validate the Discovered
Associations
H-Index Sets
43. Conclusions
Graph Invariants:
Remain the same under two
isomorphic graphs and any
representation.
Allow for uncovering hidden properties
of the graphs
Reciprocity
Density
H-Index Set
Reciprocity can suggest data
quality and incompleteness.
Density can be used to explain
complexity of graph tasks
H-index sets can comprise
entities useful to discover potential
novel associations.