This document discusses multi-relational graph structures and their applications. It introduces single-relational graphs where all edges share the same meaning, and multi-relational graphs where each edge is labeled to denote the type of relationship between vertices. The talk presents an algebra for manipulating multi-relational graphs and applications including recommender systems. Examples of multi-relational graphs used in scholarly networks and the semantic web are provided.
Multi-Relational Graph Structures From Algebra to Application
1. Multi-Relational Graph Structures:
From Algebra to Application
Marko A. Rodriguez
T-5, Center for Nonlinear Studies
Los Alamos National Laboratory
http://markorodriguez.com
October 27, 2009
2. Abstract
In a single-relational graph, all edges share the same meaning. In contrast,
a multi-relational graph represents a heterogeneous set of edges, where
each edge is labeled to denote the type of relationship that exists between
the two vertices it connects. While less prevalent than the single-relational
graph, the multi-relational graph structure is beginning to see widespread
adoption in both academia and industry. An algebra for manipulating
multi-relational graph structures and the realization of this algebra in
various application scenarios is presented in this talk.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
3. My Computer Eco-System
• Articles/Lectures: LTEX, OmniGraffle, LTEX iT
A A
• Software Development: Java, R Statistics
• Large-Scale Data Management: MySQL, Neo4j, Linked Process
• Graph/Network Analysis: iGraph, rPath, Confluence, JUNG
• Web of Data/Semantic Web: Open Sesame (SAIL), Prot´g´
e e
• 3D Modeling/Programming: Java Monkey Engine, Blender, Gimp
• Audio Synthesis/Processing: Max/MSP, ProTools
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
4. Outline
• Introduction to Graph Structures
The Single-Relational Graph
The Multi-Relational Graph
• A Multi-Relational Path Algebra
• Application to Recommender Systems
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
5. Outline
• Introduction to Graph Structures
The Single-Relational Graph
The Multi-Relational Graph
• A Multi-Relational Path Algebra
• Application to Recommender Systems
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
6. A Single-Relational Graph Example
Article C Article F
Article B Article D
Article A Article E
An article citation graph. Each vertex is an article and each edge denotes that the tail
article cites the head article.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
7. Single-Relational Graph Notation
• Homogenous set of vertex and edge types.1
• There are undirected and directed forms, where V is the set of vertices
and E is an unordered or ordered set of edges, respectively.
G = (V, E ⊆ {V × V })
G = (V, E ⊆ (V × V )) (we will focus on directed graphs in this talk.)
• There is an adjacency matrix representation A ∈ {0, 1}n×n, where
n = |V | and
1 if (i, j) ∈ E
Ai,j =
0 otherwise.
1
Unless the graph is bipartite.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
8. The Use of Single-Relational Graphs in Research
• Most common graph structure used in 90’s and 00’s research.
scholarly graphs: citations, coauthorship relationships, article/journal
usage, acknowledgements, funding sources.
technological graphs: software dependencies, Internet architecture,
web citations.
communication graphs: email correspondence, cell phone calls,
micro-blog “following.”
• Numerous algorithms have been developed for analyzing such structures.
geodesics: radius, diameter, eccentricity, closeness, betweenness.
spectral: eigenvector centrality, pagerank, spreading activation.
community detection: walktrap, edge betweenness, leading
eigenvector, spin-glass.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
9. My Work with Single-Relational Graphs
• Articles of mine that make use of the single-relational graph structure.
Bollen, J., Van de Sompel, H., Hagberg, A., Bettencourt, L.M.A, Chute, R., Rodriguez, M.A., Balakireva,
L.L., “Clickstream Data Yields High-Resolution Maps of Science,” PLoS One, 4(3), e4803, 2009.
Bollen, J., Van de Sompel, H., Rodriguez, M.A., “Towards Usage-Based Impact Metrics: First Results from the MESUR
Project,” Joint Conference on Digital Libraries (JCDL), 2008.
Rodriguez, M.A., Pepe, A., “On the Relationship Between the Structural and Socioacademic Communities of
a Coauthorship Network,” Journal of Informetrics, 2(3), pp. 195–201, 2008.
Rodriguez, M.A., Bollen, J., “An Algorithm to Determine Peer-Reviewers,” Conference on Information and Knowledge
Management (CIKM), pp. 319–328, 2008.
Rodriguez, M.A., Bollen, J., Van de Sompel, H., “Mapping the Bid Behavior of Conference Referees,” Journal
of Informetrics, 1(1), pp. 62–82, 2007.
Bollen, J., Rodriguez, M.A., Van de Sompel, H., “Journal Status,” Scientometrics, 69(3), pp. 669-687, 2006.
Rodriguez, M.A., Bollen, J., Van de Sompel, H., “The Convergence of Digital Libraries and the Peer-Review Process,”
Journal of Information Science, 32(2), pp. 149–159, 2006.
Rodriguez, M.A., Steinbock, D.J., “A Social Network for Societal-Scale Decision-Making Systems,” Proceedings of the
North American Association for Computational Social and Organizational Science Conference, 2004.
• They focus on supporting/analyzing/ranking/visualizing the scholarly
community and large-scale decision support systems (i.e. governance
systems).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
10. Studying the Reading Behavior of Scholars
Bollen, J., Van de Sompel, H., Hagberg, A., Bettencourt, L.M.A, Chute, R., Rodriguez, M.A., Balakireva, L.L., “Clickstream
Data Yields High-Resolution Maps of Science,” PLoS One, 4(3), e4803, 2009.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
12. Predicting Referees Based on Coauthorship Patterns
BORGMAN WITTEN TAYLOR RECKER MOORE BISHOFF
MARSHALL CUNNINGHAM SUMNER CASTELLI RAY
CASSEL FURUTA GOLOVCHINSKY FUHR GIERSCH THANOS
SOMPEL FOX ALLEN NEUHOLD SOLVBERG FULKER
ARMS NELSON CHEN FOO LEGGETT JANEE
LAGOZE MARCHIONINI LYNCH RASMUSSEN BAKER LIM SANCHEZ WRIGHT
JESUROGA TSE SUGIMOTO KHOO
Rodriguez, M.A., Bollen, J., Van de Sompel, H., “Mapping the Bid Behavior of Conference Referees,” Journal of Informetrics,
1(1), pp. 62–82, 2007.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
13. A Multi-Relational Graph Example
Article C Article F
cites cites acknowledges
Article B Article D
authored
peer-reviewed
authored authored
Person A Person E
A scholarly graph. Each vertex is a scholarly artifact and each edge denotes the type of
directed relationship that exists between the two scholarly artifacts it connects.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
14. Multi-Relational Graph Notation
• Heterogeneous set of vertex types and a heterogeneous set of edge types.
• This data structure is becoming more prevalent due to both the Semantic
Web/Web of Data movement and the graph database movement.
• G = (V, E = {E0, E1, . . . , Em ⊆ (V ×V )}), where E is a family of typed
edge sets of length m. For example, E0 is the “authored” adjacency
matrix, E1 is the “cites” adjacency matrix, etc.
• There is a three-way tensor representation A ∈ {0, 1}n×n×m, where
1 if (i, j) ∈ Ek : k ≤ m
Ak
i,j =
0 otherwise.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
15. A Three-Way Tensor Representation of a
Multi-Relational Graph
A ∈ {0, 1}n×n×m
0 1 1 0 0
|V | = n
0 0 0 0 0
0 0 0 0 0
0 0 1 0 0
0 0 0 0 0
...
s
te
|E
ed
ci
|V | = n
or
|=
th
au
m
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
16. My Work with Multi-Relational Graphs
• Articles of mine that make use of the multi-relational graph structure.
Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis
Algorithms,” Journal of Informetrics, in press, 2009. [Presented in the second part of this presentation.]
Rodriguez, M.A., Geldart, J., “An Evidential Path Logic for Multi-Relational Networks,” Proceedings of the Association
for the Advancement of Artificial Intelligence Spring Symposium: Technosocial Predictive Analytics Symposium, volume
SS-09-09, pp. 114–119, 2009.
Rodriguez M.A., Bollen, J., Van de Sompel, H., “Automatic Metadata Generation using Associative Networks,” ACM
Transactions on Information Systems, 27(2), pp. 1–20, 2009.
Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems,
21(7), pp. 727–739, 2008. [Presented in the third part of this presentation.]
Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms,” Hawaii
International Conference on Systems Science (HICSS), pp. 39–49, 2007.
Bollen, J., Rodriguez, M.A., Van de Sompel, H., Balakireva, L.L., Hagberg, A., “The Largest Scholarly Semantic
Network...Ever.,” ACM World Wide Web Conference, 2007.
Rodriguez, M.A., “A Multi-Relational Network to Support the Scholarly Communication Process,” International Journal
of Public Information Systems, 2007(1), pp. 13–29, 2007.
• They focus on multi-relational graph algorithms, logic, information
retrieval, decision support systems, bibliometrics, recommender systems.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
17. Resource Description Framework Graph
lanl:article_c lanl:article_f
lanl:cites
lanl:cites lanl:acknowledges
lanl:article_b lanl:article_d
lanl:authored
lanl:peer_reviewed
lanl:authored lanl:authored
lanl:person_a lanl:person_e
lanl: → http://lanl.gov#
A scholarly graph. Each vertex and edge type is identified by a Uniform Resource
Identifier and thus, encoded in the address space of the World Wide Web.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
18. Resource Description Framework Graph
• Vertices and edge labels are identified by Uniform Resource Identifiers
(URI). Thus, there is a single address space where the world’s data can
be interrelated.
• G = (U ∪ B) × U × (U ∪ B ∪ L), where U is the set of all URIs, B is
the set of all blank nodes, and L is the set of all literals.
• There exist various implementations of this standard model.
Open Sesame (http://openrdf.org/).
AllegroGraph (http://www.franz.com/agraph/allegrograph/).
OWLim (http://www.ontotext.com/owlim/).
Jena (http://jena.sourceforge.net/)
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
19. Linked Data and the Web of Data
http://dbpedia.org/resource/Albert Einstein
http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Albert_Einstein
http://farm1.static.flickr.com/60/170621225_661c705eb4_m.jpg
http://farm4.static.flickr.com/3408/3547607847_65abfd03a5_m.jpg
foaf:depiction
foaf:depiction
flickr:Albert_Einstein
dbpprop:hasPhotoCollection
dbpedia:Albert_Einstein
dbpedia:doctoralAdvisor
dbpedia:citizenship
dbpedia:United_States dbpedia:Alfred_Kleiner
http://dbpedia.org/resource/Albert Einstein
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
20. My Work with Resource Description Framework Graphs
• Articles of mine that make use of RDF/Web of Data/Semantic Web.
Rodriguez, M.A., “Interpretations of the Web of Data,” Data Management in the Semantic Web, eds. H. Jin and Z. Lv,
Nova, in press, 2009.
Rodriguez, M.A., “A Reflection on the Structure and Process of the Web of Data,” Bulletin of the American Society for
Information Science and Technology, 35(6), pp. 38–43, 2009.
Rodriguez, M.A., “A Graph Analysis of the Linked Data Cloud,” http://arxiv.org/abs/0903.0194, February
2009.
Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the Scholarly
Communication Process,” KRS-2009-02, 2009. [Presented in the third part of this presentation.]
Rodriguez, M.A., Watkins, J., “Faith in the Algorithm, Part 2: Computational Eudaemonics,” Lecture Notes in Artificial
Intelligence, eds. Velsquez, J.D., Howlett, R.J., and Jain, L.C., volume 5712, pp 813–820, 2009.
Rodriguez, M.A., “General-Purpose Computing on a Semantic Network Substrate,” Emergent Web Intelligence,
Advanced Information and Knowledge Processing series, Eds. R. Chbeir, A. Hassanien, A. Abraham, and Y. Badr, in
press, 2008.
Rodriguez, M.A., Pepe, A., Shinavier, J., “The Dilated Triple,” Emergent Web Intelligence, Advanced Information and
Knowledge Processing series, eds. R. Chbeir, A. Hassanien, A. Abraham, and Y. Badr, in press, 2008.
• They focus on graph algorithms, distributed computing, graph-based
computing, recommender systems.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
21. The Web of Data as of March 2009
homologenekegg projectgutenberg
symbol libris
homologenekegg projectgutenberg
symbol libris cas bbcjohnpeel
unists unists cas
diseasome dailymed
bbcjohnpeel
w3cwordnet diseasome dailymed w3cwordnet
chebi
hgnc pubchem eurostat chebi
mgi
geneid
omim wikicompany
hgnc
geospecies
worldfactbook
pubchem eurostat
reactome drugbank
uniparc
pubmed
mgi
magnatune
linkedct
opencyc
omim
freebase wikicompany geospecies
uniprot
taxonomy interpro geneid
uniref geneontology
pdb
reactome yago umbel
drugbank worldfactbook
pfam dbpedia bbclatertotp
govtrack magnatune
prodom
prosite
pubmed
flickrwrappropencalais opencyc
uniparc
uscensusdata freebase
lingvoj linkedmdb
surgeradio
linkedct
uniprot virtuososponger
taxonomy rdfbookmashup
swconferencecorpus
interpro
geonames musicbrainz myspacewrapper
uniref dblpberlin geneontology pubguide pdb yago umbel
revyu
rdfohloh
jamendo
pfam
bbcplaycountdata dbpedia bbclatertotp govtrack
semanticweborg siocsites riese prosite
openguides
prodom
foafprofiles
audioscrobbler bbcprogrammes
flickrwrappropencalais
dblphannover
crunchbase uscensusdata
doapspace
surgeradio
flickrexporter lingvoj linkedmdb
budapestbme qdos
virtuososponger
semwebcentral
eurecom ecssouthampton
dblprkbexplorer
rdfbookmashup
newcastle
geonames musicbrainz
pisa
rae2001
eprints swconferencecorpus myspacewrapper
irittoulouse
laascnrs acm citeseer
ieee
dblpberlin pubguide
resex
ibm
revyu jamendo
rdfohloh
bbcplaycountdata
Rodriguez, M.A., “A Graph Analysis of the Linked Data Cloud,” http://arxiv.org/abs/0903.0194, February 2009.
semanticweborg riese
siocsites
foafprofiles
openguides audioscrobbler bbcprogrammes
dblphannover
crunchbase
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
doapspace
flickrexporter
budapestbme qdos
22. The Web of Data as of March 2009
data set domain data set domain data set domain
audioscrobbler music govtrack government pubguide books
bbclatertotp music homologene biology qdos social
bbcplaycountdata music ibm computer rae2001 computer
bbcprogrammes media ieee computer rdfbookmashup books
budapestbme computer interpro biology rdfohloh social
chebi biology jamendo music resex computer
crunchbase business laascnrs computer riese government
dailymed medical libris books semanticweborg computer
dblpberlin computer lingvoj reference semwebcentral social
dblphannover computer linkedct medical siocsites social
dblprkbexplorer computer linkedmdb movie surgeradio music
dbpedia general magnatune music swconferencecorpus computer
doapspace social musicbrainz music taxonomy reference
drugbank medical myspacewrapper social umbel general
eurecom computer opencalais reference uniref biology
eurostat government opencyc general unists biology
flickrexporter images openguides reference uscensusdata government
flickrwrappr images pdb biology virtuososponger reference
foafprofiles social pfam biology w3cwordnet reference
freebase general pisa computer wikicompany business
geneid biology prodom biology worldfactbook government
geneontology biology projectgutenberg books yago general
geonames geographic prosite biology ...
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
23. Application Development on the Web of Data
a. Application 1 Application 2 Application 3 b. Application 1 Application 2 Application 3
processes processes processes
processes processes processes
Web of Data
structures structures structures
structures structures structures
127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.1 127.0.0.2 127.0.0.3
a.) standard model b.) Web of Data model — public data changes the development
paradigm.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
24. A Key/Value Graph Example
type = article type = article
name = "Network..." name = "A Distributed..."
created = 2/1/08 created = 12/1/07
C F
type = article type = cites
type = cites type = acknowledges
name = "Algori..." weight = 1.0
weight = 1.0 weight = 1.0
created = 1/1/09
B
type = authored
D
weight = 1.0
type = article type = authored
type = authored
name = "Linked..." weight =1.0
weight = 0.5
created = 1/30/09
type = peer-reviewed
A weight = -1.0 E
type = person type = person
name = Marko name = Johan
age = 29 age = 37
A scholarly graph. Both vertices and edges maintain a key/value pair map that allows metadata to be
attached to them.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
25. Key/Value Graph
• G = (V, E ⊆ (V × V ), λ : (V ∪ E) × Ω → Σ), where Ω is the set of keys
and Σ is the set of values.
• Has a convenient representation in object-oriented programming
languages and used by various standards and graph packages.
GraphML (http://graphml.graphdrawing.org/).
Neo4j (http://neo4j.org).
NetworkX (http://networkx.lanl.gov).
Confluence (http://markorodriguez.com/docs/conf/api/).
iGraph (http://igraph.sourceforge.net/).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
26. Outline
• Introduction to Graph Structures
The Single-Relational Graph
The Multi-Relational Graph
• A Multi-Relational Path Algebra
• Application to Recommender Systems
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
27. Problem Statement
• There is a need to port all the known single-relational graph analysis
algorithms over to the multi-relational domain.
Why?: There is a large body of algorithms in the domain of single-
relational graph analysis.
Why?: Multi-relational graph structures are becoming more prevalent
and can be used to model more complex structures.
• The set of single-relational graph analysis algorithms should not be
“blindly” applied to multi-relational graphs.
Why?: For example, marko, knows, johan says more about social
communicaiton than marko, livesInSameCityAs, bob .
Why?: Multi-relational graph analysis algorithms must respect the
meaning of the edges.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
28. Solution Statement
• Provide an algebra to map a multi-relational graph to a
“semantically-rich” single-relational graph that can be subjected
to all the known single-relational graph analysis algorithms.
Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to
Single-Relational Network Analysis Algorithms,” Journal of Informetrics,
ISSN:1751-1577, Elsevier, doi:10.1016/j.joi.2009.06.004,
http://arxiv.org/abs/0806.2274, LA-UR-08-03931, in press, 2009.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
29. A Three-Way Tensor Representation of a
Multi-Relational Graph
As stated previously, a three-way tensor can be used to represent a
multi-relational graph. If
G = (V, E = {E0, E1, . . . , Em ⊆ (V × V )})
is a multi-relational graph, then A ∈ {0, 1}n×n×m and
1 if (i, j) ∈ Ek : k ≤ m
Ak
i,j =
0 otherwise.
A is the three-way tensor representation of the multi-relational graph.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
30. The General Purpose of the Path Algebra
• Map a multi-relational tensor A ∈ {0, 1}n×n×m to a single-relational path matrix
Z ∈ Rn×n — this path matrix is a weighted single-relational graph.
+
24 72
0 1 1 0 0 24 1 0 0 0 1 1 2
0 0 0 0 0 0 72 0 4 0
0 0 0 0 0 23 0 0 0 0 ≡ 23
5 4
0 0 1 0 0 0 0 15.3 0 0
12
0 0 0 0 0 0 0 0 0 12
3 15.3 4
A ∈ {0, 1}n×n×m Z ∈ Rn×n
+
• The created single-relational graph’s edges are loaded with meaning. For example,
given the right tensor, it is possible to create a coauthorship graph for scholars from
the same university who are not on the same project, but share a graduate student.
• The theorems of the algebra can be used to manipulate your operation to a more
efficient form.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
31. The Elements of the Path Algebra
• A ∈ {0, 1}n×n×m: a three-way tensor representation of a multi-relational
graph.
• Z ∈ Rn×n: a path matrix derived by means of operations applied to A.
+
——————————————————————————————
• Cj ∈ {0, 1}n×n: a “to” path filter.
• Ri ∈ {0, 1}n×n: a “from” path filter.
• Ei,j ∈ {0, 1}n×n: an entry path filter.
• I ∈ {0, 1}n×n: the identity matrix as a self-loop filter.
• 1 ∈ 1n×n: a matrix in which all entries are equal to 1.
• 0 ∈ 0n×n: a matrix in which all entries are equal to 0.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
32. The Operations of the Path Algebra
• A · B: ordinary matrix multiplication determines the number of (A, B)-
paths between vertices.
• A : matrix transpose inverts path directionality.
• A ◦ B: Hadamard, entry-wise multiplication applies a filter to selectively
exclude paths.
• n(A): not generates the complement of a {0, 1}n×n matrix.
• c(A): clip generates a {0, 1}n×n matrix from a Rn×n matrix.
+
• v ±(A): vertex generates a {0, 1}n×n matrix from a Rn×n matrix, where
+
only certain rows or columns contain non-zero values.
• λA: scalar multiplication weights the entries of a matrix.
• A + B: matrix addition merges paths.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
33. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed
h ih ih ih ih i
Example Scholarly Tensor Used in the Remainder of the
Presentation
• A1 authored : human → article
• A2 cites : article → article
• A3 contains : journal → article
• A4 category : journal → subject category
• A5 developed : human → program/software.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
34. The Traverse Operation
• An interesting aspect of the single-relational adjacency matrix A ∈ {0, 1}n×n is that when it is raised
(k)
to the kth power, the entry Ai,j is equal to the number of paths of length k that connect vertex i to
vertex j .
(1)
• Given, by definition, that Ai,j (i.e. Ai,j ) represents the number of paths that go from i to j of length
1 (i.e. a single edge) and by the rules of ordinary matrix multiplication,
(k) (k−1)
Ai,j = Ai,l · Al,j : k ≥ 2.
l∈V
a b c
a b c a b c a b c
a 0 1 0 a 0 1 0 a 0 0 1
b 0 0 1 · b 0 0 1 = b 0 0 0
c 0 0 0 c 0 0 0 c 0 0 0
there is a path of length 2
from a to c
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
35. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed
h ih ih ih ih i
The Traverse Operation
Z = A1 · A2 · A1 ,
Zi,j defines the number of paths from vertex i to vertex j such that a path goes from author i to one the
articles he or she has authored, from that article to one of the articles it cites, and finally, from that cited
article to its author j . Semantically, Z is an author-citation single-relational path matrix.
A2
Article B cites Article C
A1 authored A1
authored
Human A author-citation Human D
Z
• NOTE: All diagrams are with respect to a “source” vertex (the blue vertex) in order to preserve clarity. In reality, the
operations operate on all vertices in parallel.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
37. The Filter Operation
• A◦1=A
• A◦0=0
• A◦B=B◦A
• A ◦ (B + C) = (A ◦ B) + (A ◦ C)
• A ◦ B = (A ◦ B) .
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
38. The Not Filter
The not filter is useful for excluding a set of paths to or from a vertex.
n : {0, 1}n×n → {0, 1}n×n
with a function rule of
1 if Ai,j = 0
n(A)i,j =
0 otherwise.
0 0 1 1 1 1 1 0 0 0
1 0 1 0 1 0 1 0 1 0
n 0 1 1 1 1 = 1 0 0 0 0
1 1 0 1 1 0 0 1 0 0
1 1 1 1 0 0 0 0 0 1
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
39. The Not Filter
If A ∈ {0, 1}n×n, then
• n(n(A)) = A
• A ◦ n(A) = 0
• n(A) ◦ n(A) = n(A).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
40. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed
h ih ih ih ih i
The Not Filter
A coauthorship path matrix is
Z = A1 · A1 ◦ n(I)
Article B
A1 authored
A1
authored
Human A coauthor Human C
Z
n(I)
coauthor
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
41. The Clip Filter
The general purpose of clip is to take a path matrix and “clip,” or
normalize, it to a {0, 1}n×n matrix.
c : Rn×n → {0, 1}n×n
+
1 if Zi,j > 0
c(Z)i,j =
0 otherwise.
24 1 0 0 0 1 1 0 0 0
0 72 0 4 0 0 1 0 1 0
c 23 0 0 0 0 = 1 0 0 0 0
0 0 15.3 0 0 0 0 1 0 0
0 0 0 0 12 0 0 0 0 1
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
42. The Clip Filter
If A, B ∈ {0, 1}n×n and Y, Z ∈ Rn×n, then
+
• c(A) = A
• c(n(A)) = n(c(A)) = n(A)
• c(Y ◦ Z) = c(Y) ◦ c(Z)
• n(A ◦ B) = c (n(A) + n(B))
• n(A + B) = n(A) ◦ n(B)
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
43. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed
h ih ih ih ih i
The Clip Filter
Suppose we want to create an author citation path matrix that does not allow self citation or coauthor
citations. „ « „ „ ««
1 2 1 1 1
Z= A ·A ·A ◦n c A · A ◦ n(I) ◦ n(I)
|{z}
| {z } | {z } no self
cites no coauthors
Z
author-citation Human D
authored
2
A A1
Article B cites Article C
A 1 A1
authored authored authored
Human A coauthor Human E
n c A1 · A1 ◦ n(I)
self n(I)
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
44. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed
h ih ih ih ih i
The Clip Filter
However, using various theorems of the algebra,
Z = A1 · A2 · A1 ◦ n c A1 · A1 ◦ n(I) ◦ n(I)
no self
cites no coauthors
becomes
Z = A1 · A2 · A1 ◦ n c A1 · A1 ◦ n(I).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
45. The Vertex Filter
In many cases, it is important to filter out particular paths to and from a
vertex.
v − : Rn×n × N → {0, 1}n×n,
+
− 1 if k∈V Zi,k > 0
v (Z)i,j =
0 otherwise
turns a non-zero column into an all 1-column and
v + : Rn×n × N → {0, 1}n×n,
+
+ 1 if k∈V Zk,j > 0
v (Z)i,j =
0 otherwise
turns a non-zero row into an all 1-row.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
46. The Vertex Filter
0 1 0 1 0 0 1 0 1 0
0 0 0 0 0 0 1 0 1 0
v− 0 2 0 32 0
= 0 1 0 1 0
0 23 0 0 0 0 1 0 1 0
0 0 0 0 0 0 1 0 1 0
v + not diagrammed, but acts the same except for makes 1-rows. Two import filters are the column and
row filters, C ∈ {0, 1}n×n and R ∈ {0, 1}n×n , respectively.
0 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
C2 = 0 1 0 0 0 R3 = 1 1 1 1 1
0 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
47. The Vertex Filter
• v −(Ci) = Ci
• v +(Rj ) = Rj
• v −(Z) = v +(Z )
• v +(Z) = v −(Z ) .
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
48. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed
h ih ih ih ih i
The Vertex Filter
Assume that vertex 1 is the social science subject category vertex and we want to create a journal
citation graph for social science journals only.
» „ «–
+ 4 3 2 3 − 4
h “ ” i
Z= v C1 ◦ A ◦ A ·A · A ◦v R1 ◦ A .
| {z } | {z }
soc.sci. journal articles articles in soc.sci. journals
social-science journal citation
Z 1 Social
Science
category
category
A2 Article C contains Journal E
A3 cites A3
v − R1 ◦ A4
Journal A contains Article B
cites
+ 4
v C1 ◦ A 2
Article D contains Journal F
A
A3
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
50. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed
h ih ih ih ih i
The Vertex Filter
Z = v + C1 ◦ A4 ◦ A3 ·A2 · A3 ◦ v − R1 ◦ A4 .
soc.sci. journal articles articles in soc.sci. journals
However,
v − R1 ◦ A4 = v− C1 ◦ A4 Cx = Rx
= v + C1 ◦ A4 v +(Z) =
v −(Z ) .
Therefore, because A ◦ B = (A ◦ B) ,
Z = v + C1 ◦ A4 ◦ A3 ·A2 · v + C1 ◦ A4 ◦ A3 .
reused reused
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
52. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed
h ih ih ih ih i
The Weight and Merge Operations
Z = 0.6 A1 · A1 ◦ n(I) + 0.4 A5 · A5 ◦ n(I)
coauthorship co-development
merges the article and software program collaboration path matrices as
specified by their respective weights of 0.6 and 0.4. The semantics of the
resultant is a software program and article collaboration path matrix that
favors article collaboration over software program collaboration. A
simplification of the previous composition is
Z = 0.6 A1 · A1 + 0.4 A5 · A5 ◦ n(I).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
53. Outline
• Introduction to Graph Structures
The Single-Relational Graph
The Multi-Relational Graph
• A Multi-Relational Path Algebra
• Application to Recommender Systems
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
54. kReef: A Scholarly Recommendation Engine
1. The scholarly community is modeled using a multi-relational graph.
2. A “walker”-version of the path algebra is applied to the graph to support scholars.
Graphical User Interface
Analytics Grammar Walker
Translators
Engine Engine
2
Multi-Relational Graph Database
1
ontology
instances
Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the Scholarly Communication
Process,” KRS-2009-02, http://arxiv.org/abs/0905.1594, 2009.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
55. kReef: Ontology Classes
core:Reefsource
Ag It Ev
core:Agent core:Item core:Event
Gr Pe
Do Co Cf
core:Group core:Person
core:Document core:Collection core:Conference
Cs
Or Pj Ar Bo core:Course
core:Organization core:Project core:Article core:Book Me
Vg Jo core:Meeting
Fu core:Viewgraph core:Journal Pn
Ac Lb
core:FundingOpportunity Wp core:Panel
core:Academic core:Webpage core:Library
Da Ps
Cm Mg
core:Dataset Md core:Presentation
core:Commerical core:Magazine
Sw core:Media
Gv
core:Software Np Ss Kn
core:Government core:Newspaper core:Session core:Keynote
Ca Au
core:Call Po Se
core:Audio
core:Proceedings core:SocialEvent
Im
Cc Tu
core:Image
core:CallForChapters core:Tutorial
Vi
Cp core:Video Wk
core:CallForPapers core:Workshop
Cl
core:CallForProposals
Ct
core:CallForTutorials
Cw
core:CallForWorkshops
• NOTE: All edges denote an rdf:subClassOf relationship (either directly or inferred).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
57. kReef: Instance Data Ingestion
Connotea
arXiv CiteULike
Multi-Relational Graph Database
CogPrints ontology
CogPrints
instances
CiteSeer BibSonomy
CrossRef
ACM, IEEE, IOP, Springer, Blackwell, Elsevier, etc.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
58. kReef: Grammar Walker Engine Overview
• A walker-based implementation of the path algebra is applied to the
scholarly model in order to support scholars in their professional lives.
The path description is known as a “grammar” because it can be modeled
as a finite state machine embedded in the walker.
identify articles related to some interesting resource.
identify collaborators for a funding opportunity.
identify a publication venue for a newly created article.
identify referees to review an article.
identify resources of interest in one’s community.
Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems, 21(7), pp. 727–739,
http://arxiv.org/abs/0803.4355, 2008.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
59. kReef: Grammar Walker Engine Algorithm, Part 1
• First, when trying to solve a recommendation problem, determine which
abstract path should be searched to find a solution — this is usually
based on hunch and then validated using real-world data.
For example, what makes a good peer-reviewer/referee for an article:
someone that is cited by the article and their respective coauthors.
Moreover, a referee should not include the authors of the article or
their coauthors one step away in the coauthorship network (conflict of
interest).
• Let us denote the path description/grammar/contraint ψ.
Rodriguez, M.A., Bollen, J., “An Algorithm to Determine Peer-Reviewers,” Conference on Information and Knowledge
Management (CIKM), pp. 319–328, http://arxiv.org/abs/cs/0605112, 2008.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
60. kReef: Grammar Walker Engine Algorithm, Part 2
• Program a collection of discrete walkers to traverse the abstract
path defined by ψ. Each walker starts at some vertex i ∈ V and with
an energy value ∈ R. As it walks the graph, its energy decays.
Given the peer-review/referee example, the source vertex is the article
that requires a set of referees.
ψ
t=3
t=1
t=2
i
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
61. kReef: Grammar Walker Engine Algorithm, Part 3
• The solution to the problem is where the highest energy flow in
the network exists after k time steps.
Given the peer-review example, the highest energy vertices are those
people most competent to review the article in question.
In short,
Ψ × P(V ) → ω,
where Ψ is the set of all grammars, P(V ) is the set of all sets of source
vertices, and ω : V → R is the resultant energy flow for each vertex in the
graph. Or,
Grammar × Set<Vertex> → Map<Vertex, Double> .
path description source vertices ranked results
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
62. Other Application Scenarios
• Populating metadata poor resources with data propagated from metadata
rich resources. Walkers take particular paths, pick up metadata from
rich resources, and attach metadata to atrophied resources.
Rodriguez M.A., Bollen, J., Van de Sompel, H., “Automatic Metadata Generation using Associative Networks,” ACM
Transactions on Information Systems, 27(2), pp. 1–20, http://arxiv.org/abs/0807.0023, 2009.
• Generate a context-senstive representative decision-making structure that
reflects the voting behavior of the full population even as the actual voting
population wanes in size.
Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms,” Hawaii
International Conference on Systems Science (HICSS), pp. 39–49, http://arxiv.org/abs/cs/0609034, 2007.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
63. Future Work in this Area
• Further develop the path algebra. Explore other matrix and tensor
operations and determine if they are meaningful in the context of
manipulating multi-relational graphs.
• Develop a programming language (Turing Complete?) to easily
represent path descriptions for walkers. Make it easier for developers
to deploy swarms of walkers within a multi-relational network for various
application scenarios.
Recommender systems
Vertex and edge ranking systems
Information retrieval systems
General graph analysis
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
64. Conclusion
• Thank you for your time...
My homepage: http://markorodriguez.com
Linked Process: http://linkedprocess.org
Neno/Fhat: http://neno.lanl.gov
Collective Decision Making Systems: http://cdms.lanl.gov
Faith in the Algorithm: http://faithinthealgorithm.net
MESUR: http://www.mesur.org
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009