SlideShare ist ein Scribd-Unternehmen logo
1 von 97
Downloaden Sie, um offline zu lesen
Alexander Troussov, Ph.D., IBM Dublin Software Lab
16th of April 2011, Mathlingvo Seminar, St.Petersburg State University, Russia




Graph-based methods
to exploit “weak” knowledge




                                                                                 © 2011 Alexander Troussov
About AT



    IBM Ireland Center for Advanced Studies - Chief Scientist
    IBM LanguageWare group – the Architect
    National Geophysical Data Center, Boulder, CO, USA - Visiting scientist
     – Fuzzy logic based search engine for search in large databases when exact parameters
       of search are hard to define
    Observatoire de la Côte d’Azur, Nice, France – Visiting scientist
     – numerical simulation in stochastic physics
    Institute of Physics of the Earth (Russian Academy of Sciences) and the International
    Institute for Earthquake Prediction Theory and Mathematical Geophysics, Moscow, Russia -
    Lead Researcher
      – R&D in geophysics and geoinformatics
    System programming at the Institute of Precise Mechanics, Moscow
    PhD in Mathematics from Lomonosov Moscow State University



2                                                                               © 2011 Alexander Troussov
Natural Language Understanding is Inferencing (?)



    From computational point of view
    natural language understanding
    is inferencing

      – Text which mentions
               Malahide
        is probably about
               Canada (??)




          Malahide (Canada 2006 Census population
          8,828) is a township in Elgin County, Ontario,
          Canada




Source: Troussov et al. MITACS, Canada, 2010

3                                                          © 2011 Alexander Troussov
Inferencing



    Terms are ambiguous, and our knowledge is never “the truth, the whole truth, and nothing
    but the truth”
      – Malahide, Co. Dublin
      – Malahide is a township in Elgin County, Ontario, Canada.
      – Paradis Gisenyi Malahide is a hotel in Rwanda
    Solution (Troussov et al. MITACS, Canada, 2010 ): propagation from multiple concepts, for
    instance, the initial seed for the activation propagation starts at two nodes in a geographical
    taxonomy: Malahide (Ontario) and Malahide (Co. Dublin) as well as from other concepts
    mentioned in the text
          • Text which mentions Malahide and Europe – is a little bit more likely to be about
            Ireland than about Canada
          • Text which mentions Malahide and Clontarf – is more likely to be about
           Ireland than about Canada
         • …
         • Cohesive coherent text which mentions: Malahide, Mulhuddart, Lansdowne,
           Clontarf, Donabate - is almost for sure about Dublin



4                                                                                     © 2011 Alexander Troussov
Knowledge, Lexico-Semantic Resource




        Text             Relevancy




5                                     © 2011 Alexander Troussov
Text – Semantic Network


                                         NETWORK OF CONCEPTS




                                                        Finding “focus” concept
 Mapping of term mentions to concepts
 .




     Mention   Mention   Mention   Mention

                         TEXT

                                                                   © 2011 Alexander Troussov
NLU as inferencing




    The concept of a car is relevant to a text.
    Car IS-A “on-land travel” (?)
    Therefore “on-land travel” is somewhat relevant to the text, …




7                                                                    © 2011 Alexander Troussov
Text – Semantic Network


                                         NETWORK OF CONCEPTS




                                                        Finding “focus” concept
 Mapping of term mentions to concepts
 .




     Mention   Mention   Mention   Mention

                         TEXT

                                                                   © 2011 Alexander Troussov
Demo



    – 2 1 Spreading Activation.pdf




9                                    © 2011 Alexander Troussov
Agenda



 Introduction
 Building Semantic Model
 SA
 Research Challenges
  – Why SA
  – Relayability of inferencing
  – What is the purpose of graph operations
 Centrality, network flow methods
 Zoo of algorithms
 Nepomuk Recommender




                                              © 2011 Alexander Troussov
Text – Semantic Network


                                         NETWORK OF CONCEPTS




                                                        Finding “focus” concept
 Mapping of term mentions to concepts
 .




     Mention   Mention   Mention   Mention

                         TEXT

                                                                   © 2011 Alexander Troussov
Spreading Activation Methods




                               © 2011 Alexander Troussov
There is an increased need for a new generic and formal understanding of spreading
     activation as a class of algorithms rather than a particular algorithm with many parameters
     Spreading activation (also known as spread of activation) is a method for searching
     associative networks, neural networks or semantic networks. The method is based on the
     idea of quickly spreading an associative relevancy measure over the network. Our goal is to
     give an expanded introduction to the method. We will demonstrate and describe in sufficient
     detail that this method can be applied to very diverse problems and applications. We present
     the method as a general framework. First we will present this method as a very general
     class of algorithms on large (or very large) so-called multidimensional networks which will
     serve a mathematical model.




Source:   Troussov, Levner, Bogdan, Judge, Botvich “Spreading activation methods”
13                                                                                   © 2011 Alexander Troussov
We present spreading activation in a generic form, as a set of methods suitable for mining
     multidimensional networks with oriented weighted links. These graphmining methods might
     produce results similar to those which might be achieved by soft clustering and fuzzy
     inferencing. The input object is a function on nodes of the network, and the spread of
     activation is a technique which provides “spreading” of this function through the network
     links. The result of the spreading activation is a new function on the nodes. The properties of
     that function strongly depend on the original function and the parameters of the spreading
     activation. For instance, when the underlying network is a network of ontological concepts,
     parameters governing spread might be chosen in such a way that allows “smoothing” of the
     original function and interpreting the resulting function as “conceptual” summaries of the
     initial non-zero valued nodes.




14                                                                                    © 2011 Alexander Troussov
Origin of Spreading Activation Methods



     In neurophysiology interactions between neurons is modeled by way of activation which
     propagates from one neuron to another via connections called synapses to transmit
     information using chemical signals. The first spreading activation models were used in
     cognitive psychology to model this processes of memory retrieval (Collins, A.M. & Loftus,
     E.F., 1975; Anderson, J.,1983).
     This framework was later exploited in Artificial Intelligence (AI) as a processing framework
     for semantic networks and ontologies, and applied to Information Retrieval (Crestani, F.,
     1997; Aleman-Meza, Halaschek, Arpinar, & Sheth, 2003; Rocha, C, Schwabe, D. & Poggi de
     Aragao, M., 2004; …) as the result of direct transfer of information retrieval ideas from
     cognitive sciences to AI.




15                                                                                  © 2011 Alexander Troussov
Notation



     A multidimensional network can be modeled as a directed graph, which is a pair
     G = (V,E)
     where
     V – is the set of vertices vi
     E – is the set of edges ej (although in oriented graphs edges are referred to as arcs)
     init: E → V – is the mapping which provides initial nodes for arcs
     term: E → V – is the mapping which provides terminal nodes for arcs
     imp – is importance value of arcs and nodes.
     For instance, imp(v) where the node v is a geographical location, might be the population. Imp(e)
     number of phone calls from person init(e) to person term(e).
     w – “weights”, for instance, the sigmoidal function of imp.
     w(ej)=0 means that effectively arc ej is ignored
     w(ej)=1 means that activation of init(ej) strongly affects the activation of term(ej). For instance,
     when the nodes represent “words”, synonym links might be assigned the value 1.
     F(E) – is the “activation” function, usually a real valued function on nodes of the network.

16                                                                                             © 2011 Alexander Troussov
Generic description of spreading activation methods (SAM)
framework


     1.          Initialisation
                   Sets the parameters of the algorithm, network, and initial F(E) as a list of non-zero
                      valued nodes V n
     2.          Iterations
                   (each iteration is one pulse of SAM)
          – a.   List Expansion
                   the list is expanded to include neighbors (including both neighbors following outgoing
                      links, and neighbors which have links to the nodes in the list). Newly added nodes
                      receive a zero valued level of activation
          – b.   Recomputation
                   the value at each node in the list is recomputed based on the values of the function on
                      nodes which have links to the given node and types of connections
          – c.   List Purging
                   The list is purged - we exclude the nodes with the values less than a threshold.
          – d.   Conditions Check To Break Iterations
                   like maximum number of iterations to be performed.
     3.          Output
                  The list of nodes (value of the function after spread of activation) ranked according F
                    values.



17                                                                                              © 2011 Alexander Troussov
Generic description of recomputation phase



     We have the list of nodes V n .
     1. Input/Output Through Links Computation.
      – For each node v we compute the input signal to each arc e, such that init(e)=v. When the
         signal (“activation”) passes through a link e, the activation usually experiences decay by
         a factor w(e)
     2. Input/Output of Node Activation
       – Before the pulse, the node v has the activation level F(v).
           • Through incoming links v get more activation, By dissipating the activation through
             outgoing links, the node v might lose activation.
     3. Computation of the New Level of Activation
       – A new value F(v) is computed based on F(v), Input (v), and Output (v)




18                                                                                   © 2011 Alexander Troussov
Generic description of recomputation phase



     1. Input/Output Through Links Computation.
     For each node v we compute the input signal to each arc e, such that init(e)=v. This
     computation can be based on the value F(v), the outdegree of a node etc. For instance, if
     the node v has n outgoing arcs of the same type, each arc e might get input signal:
                       I (e) = F(init(e)) · (1 / outdegree(v)**beta )
     where beta might be equal to 1. It could be also less than one, in which case the node v will
     propagate more activation to its neighbors than it has.
     When the signal (“activation”) passes through a link e, the activation usually experiences
     decay by a factor w(e):
                       O (e) = I(e) · w(e)




19                                                                                    © 2011 Alexander Troussov
Generic description of input/output phase



     2. Input/Output of Node Activation
     Before the pulse, the node v has the activation level F(v).


     Through incoming links v get more activation:
                        Input(v) = Σ O(e)
     for all links e such that init(e) ∈V n, term(e) = v.


     By dissipating the activation through outgoing links, the node v might lose activation:
                        Output(v) = Σ I(e)
     for all links e such that init(e) = v, term(e) ∈V n




20                                                                                     © 2011 Alexander Troussov
Generic description of recomputation phase



     3. Computation of the New Level of Activation
     A new value F(v) is computed based on F(v), Input (v), and Output (v), for example
             Fnew(v) = F(v) + Input (v)




21                                                                                 © 2011 Alexander Troussov
SAM and Methods of Numerical Simulation in Physics



 Spreading activation algorithms were introduced in 1990s; however the same iterative
 methods were used long before in numerical simulation in physics, mechanics, chemistry
 and engineering sciences. The major distinctions of these algorithms from what is called
 now as spreading activation are:
   – a) in physics – such algorithms usually work on a regular mesh (so that the local
     topology of the graph is encoded into formulas of the recomputation stage)
   – b) in physics – initial conditions, or initial activation – are usually assigned to all nodes
     on the mesh; and the use of algorithms for efficient graph traversal is not needed. For
     instance, steps 2a (List expansion) and 2b (List Purging) in the generic description of
     SAM framework might be skipped.
 For instance, one dimensional heat transfer equations might be numerically simulated on a
 one-dimensional mesh, by iterative methods. On each iteration recomputation stage is
 based on the formula below:
         Fnew (v) = ( F(RightNeighbor(v)) + F(LeftNeighbor(v)) ) / 2
 Using a different formula, one can simulate the behavior of an oscillating string (although this
 will require storing tree values at each node - position, mass and velocity of the material
 point corresponding to the node).
                                                                                     © 2011 Alexander Troussov
SAM and Methods of Numerical Simulation in Physics



     Using the same iterative algorithm, with one set of parameters one can emulate heat
     transfer; with another set of parameters the same algorithm will show us the behavior of
     oscillating strings. But the phenomena of heat propagation and string oscillation are quite
     different (for instance, heat propagation might lead to “thermal death” - the state of
     equilibrium where the level of activation is the same for all nodes, while oscillation might
     continue forever). Our illustration concern only basics, while real modeling might be much
     more complicated, for instance, hear transfer might lead to combustion, where after reaching
     some level of activation a node generates more “heat” than it gets from neighboring nodes.




23                                                                                  © 2011 Alexander Troussov
24   © 2011 Alexander Troussov
Spreading Activation as a Graphmining Technique




     The technique of SAM is quite polymorphic. On this slide we interpret the results of
     spreading activation in terms of graph mining.
       – First of all, one can think that after running SAM the most activated nodes will be those
         nodes, which get the activation from multiple sources, or, in other words, those nodes
         which minimize the “distance” to the nodes which were initially activated. Therefore
         these nodes might be considered as potential centroids of strong clusters induced by the
         initial activation. Since partitioning of the nodes according to these clusters is not
         immediately available (and is not needed in many applications), SAM algorithms might
         be considered as methods of soft clustering.
       – On the other hand, the most activated nodes are those nodes, which are connected to
         the initial conditions by particular types of directed links (arcs with large weights).
         Therefore we might consider SAM as an efficient scheme for computing fuzzy
         inferencing. For such applications replacing a single valued function F by a vector
         function might be useful.
     We conclude by noting that SAM algorithms might be used for soft clustering and fuzzy
     inferencing on networks.

25                                                                                   © 2011 Alexander Troussov
Γαλλία                  People




     Παρίσι


                       Ναπολέων            Αλέξανδρος




        Geographical
          artifacts
                                      Relations
                                      • Friends
                                      • Part of, Instance of, Subcluss
                                      • Created




26                                                                       © 2011 Alexander Troussov
France                                                           Russia




     Paris                                                                       Moscow


                      Napoleon                          Alexander



                                                                                         Borodino


                                                                    Kutuzov

                                      Meeting:
                                 Battle of Austerlitz




                                      Meeting:
                                 Battle of Borodino




                               Project:
                          Invasion of Russia


27                                                                            © 2011 Alexander Troussov
Diagram on the previous slide …




     What it represents?
     How it can be used?




28                                © 2011 Alexander Troussov
France                                                                                Russia




     Paris                                                                                            Moscow


                      Napoleon                           Alexander



                                                                                                              Borodino


                                                                           Kutuzov

                                      Meeting:
                                 Battle of Austerlitz




                                      Meeting:
                                 Battle of Borodino
                                                        How this diagram could be used?
                                                        1.Network flow process could show the nodes most relevant
                                                        to the pair “Napoleon” & “Meeting”
                                                        - Selection WHO – whom to invite
                               Project:                 - Other nodes – explain recommendations
                          Invasion of Russia            2.When Napoleon opens email or a web page containing W&P
                                                        he will be advised that the content of this resource is relevant
                                                        to his project “Invasion of Russia”0
29                                                                                                 © 2011 Alexander Troussov
Diagram on the previous slide … What it represents?



     Data from Facebook, data from Napoleon’s Lotus Notes calendar, structure of a Wiki,
     network of collocations or relations between the entities in W&P, …
      – The proliferation of Web 2.0 and Enterprise 2.0 technologies has lead to the emergence
        of massive networks connecting people and various digital artifacts. These networks can
        be treated as a “weak” knowledge, which nevertheless might be used recommendations
        and even for such traditional applications as knowledge-based text processing
     Or instantiation of an ontology related to W&P by Leo Tolstoy
      – In which case we would probably know that Napoleon is emperor of France, Paris is the
        capital (not instantiation of a subclass) of France, etc.
     Ontology provides conceptualization, allow inferencing, but these advantages per se are
     useless without tedious manual work to encode the rules how to use this additional
     knowledge. While the knowledge encoded in the topology of the multidimensional network is
     ready to use provided that methods are tolerant to errors and inconsistencies in data - i.e.
     the methods are methods of “soft mathematic” – fuzzy inferencing, soft clustering, …




30                                                                                 © 2011 Alexander Troussov
Social Context = Knowledge ?

                 A New Mathematical Model of Horse Racing
     Assume, without the loss of generality, that each horse in the horse
     racing is modelled by a wooden ball of radius Ri.




                                          = a ball ? ☺



31                                                                 © 2011 Alexander Troussov
Representing social context as a knowledge allows us to
     benefit from the experience of knowledge based
     applications.




32                                                     © 2011 Alexander Troussov
For instance, the social context modeled as a network is not much different from semantic networks
     which are formed from concepts represented in ontologies. And it is possible to use such networks
     for knowledge based text processing. Representing social context as knowledge allows us to draw
     experience from such mature R&D area as knowledge-based text processing




33                                                                                            © 2011 Alexander Troussov
How to model the social context



     As multidimensional networks
      – The primary source - network models of instantiations of techno-social systems
     As a “Knowledge” – represented as objects, clauses, XML, graphs, some combination of
     these




34                                                                                © 2011 Alexander Troussov
The primary source – network models of techno-social systems




                    Invited




                              Joined
                                       Log-files of Techno-Social systems (like
Created                                Facebook or IBM’s Lotus Connections)
                                       keep track about who did what.
                                       Triples could be aggregated into a
                                       network.




 35                                                              © 2011 Alexander Troussov
Examples of Graph Models:
Folksonomies: – Tripartite Hypergraph


     Social bookmarking systems (Del.icio.us, …)
      – Where to keep my bookmarks?
      – Users (actors), resources, tags
     In social bookmarking systems users describe bookmarks by keywords called tags. The
     structure behind these social systems, called folksonomies, can be viewed as a tripartite
     hypergraph of actors, tag and resource nodes.
       – Three types of citizens of the first class citizens, and hyperplanes
       – If hyperplanes are made from rubber, they could be schinked to a node, so the
         hyperplanes will also be citizens of the first class
     Advantages of the network models (see next slide)
      – Extensibility
      – Easy of merge heterogeneous information




Source:   Hypergraphs: see Jäschke et al. "Logsonomy — A Search Engine Folksonomy" MediaICWSM 2008AAAI Press (2008)
36                                                                                                                    © 2011 Alexander Troussov
Inferencing – “Soft methods” could provide reliable inferencing

     For instance, the social context modeled as a network is not much different from semantic networks
     which are formed from concepts represented in ontologies. And it is possible to use such networks
     for knowledge based text processing. Representing social context as knowledge allows us to draw
     experience from such mature R&D area as knowledge-based text processing




37                                                                                            © 2011 Alexander Troussov
Natural Language Understanding is Inferencing (?)



     From computational point of view
     natural language understanding
     is inferencing

       – Text which mentions
                Malahide
         is probably about
                Canada (??)




          Malahide (Canada 2006 Census population
          8,828) is a township in Elgin County, Ontario,
          Canada




Source: Troussov et al. MITACS, Canada, 2010

38                                                         © 2011 Alexander Troussov
Inferencing



     Terms are ambiguous, and our knowledge is never “the truth, the whole truth, and nothing
     but the truth”
       – Malahide, Co. Dublin
       – Malahide is a township in Elgin County, Ontario, Canada.
       – Paradis Gisenyi Malahide is a hotel in Rwanda
     Solution (Troussov et al. MITACS, Canada, 2010 ): propagation from multiple concepts, for
     instance, the initial seed for the activation propagation starts at two nodes in a geographical
     taxonomy: Malahide (Ontario) and Malahide (Co. Dublin) as well as from other concepts
     mentioned in the text
           • Text which mentions Malahide and Europe – is a little bit more likely to be about
             Ireland than about Canada
           • Text which mentions Malahide and Clontarf – is more likely to be about
            Ireland than about Canada
          • …
          • Cohesive coherent text which mentions: Malahide, Mulhuddart, Lansdowne,
            Clontarf, Donabate - is almost for sure about Dublin
     Such rapid “phase transition” from uncertainty to certainty is similar to the
     transition related to percolation threshold

39                                                                                     © 2011 Alexander Troussov
from Uncertainty to Certainty in Inferencing: phase transitions as a function
of seed size in analogy to ones in percolation


     In (semantic) networks with high local density
     the reliability of inferencing from a single concept is almost never sufficient,
     reliability could be low when inferencing starts from a small number of seed concepts,
     but inferencing becomes very reliable at some level of the number of the initial seed
     concepts (which could be explained by combinatorics)




           Reliability
           of inferencing




40                                                 Number of nodes in the seed       © 2011 Alexander Troussov
And could be explained by combinatorics




     A graph showing the approximate probability of at least two people sharing a birthday
     amongst a certain number of people.
     In probability theory, the birthday problem, or birthday paradox, pertains to the probability
     that in a set of randomly chosen people some pair of them will have the same birthday. By
     the pigeonhole principle, the probability reaches 100% when the number of people reaches
     366 (ignoring February 29 births). But perhaps counter-intuitively, 99% probability is reached
     with a mere 57 people, and 50% probability with 23 people.
41                                                                                    © 2011 Alexander Troussov
Simulation
     The network (such as a taxonomy of geographical
     locations) is the tree of 20,000 nodes. Text is modeled
     as a list of 100 terms each of which is ambiguous and
     could be mapped into 8 network nodes. When such
     mapping happens, we consider that the node (the
     geographical location represented by the node) could
     be relevant to the text.
     We are looking for clusters such as the groups of N
     nodes each of them is mentioned in the text and the
     graph distance between each pair of nodes in the
     cluster is less than three.
     Such graph structures have low probability of
     occurrence for small N (N=1 or 2), and their probability
     sharply decreases to zero for bigger N;
     correspondingly, our certainty that the graph structure
     signifies the topicality of the text increases to 1.0
       – Text which mentions Malahide, Mulhuddart,
         Lansdowne, Clontarf, Donabate - is almost for sure
         about Dublin (Ireland)

Source: F. Darena and A. Troussov 2010

42                                                              © 2011 Alexander Troussov
Processes in Networks



 How we study the Earth?
  – By looking at the results of the propagation of
    waves through the Earth
    Propagation of seismic wave in the ground
    and the effect of presence of land mine
 Similarly, one can study the networks
 by network flow methods
   – introducing the processes where something
     is flowing from node to node across the
     edges




                                                      © 2011 Alexander Troussov
Processses



     Used goods- trail
     Money - walk
     Gossip - replication rather than transference (trails rather than walks)
     E-mail - diffusion by replication
     Attitudes - spread through replication rather than transfer
     Infection - spreads like gossip, but does not re-infect
     Packages - usually the shortest route possible
     Relevancy in semantic networks
     Trust - Shortest path or volume?




44                                                                              © 2011 Alexander Troussov
45   © 2011 Alexander Troussov
we are talking about consumability of centrality measurements
produced by network flow methods like these       (DEMO)




46                                                       © 2011 Alexander Troussov
Key difference between SNA and other approaches to social science



     Social sciences usually have focus
     on attributes of individual actors




47                                                      © 2011 Alexander Troussov
Key difference between SNA and other approaches to social science

     SNA focus on relationships
     between actors
     “Social network analysis reflects a shift from the
     individualism common in the social sciences towards a
     structural analysis”.
       Garton et al. Studying Online Social Networks
     Structuralism is an approach to the human sciences
     that attempts to analyze a specific field (for instance,
     mythology) as a complex system of interrelated parts.
      лингвистс Романа Якобсона и Ник. Трубецкоj
     антрополог Леви-Стросс
     ~ Complex systems
     Sociogram:
        – Jacob Levy Moreno (1889-1974) was a Austrian-American
          leading psychiatrist and psychosociologist, thinker and
          educator, the founder of psychodrama, and the foremost
          pioneer of group psychotherapy.
          Among Moreno’s primary contributions to sociometrics was
          the sociogram. The sociogram is a method of representing
          individuals as points on graphs and using lines and arcs to
          represent the relationships between the individuals.

     Graphics from Prof. Hendrik Speck's tutorial at 5th Karlsruhe Symposium for Knowledge
     Management in Theory and Praxis, 2007
48                                                                                           © 2011 Alexander Troussov
Prominence



     The study of structural properties of networks and their interplay with the processes taking
     place on the network is one of the main problems in the last years in the field of complex
     network analysis
     A primary use of graph theory in social network analysis is to identify “important”
     actors.
     Centrality and prestige concepts seek to quantify graph theoretic ideas about an individual
     actor’s prominence within a network by summarizing structural relations among the graph
     nodes.
     An actor’s prominence reflects its greater visibility to the other network actors (an audience).
     An actor’s prominent location takes account of the direct sociometric choices made and
     choices received (outdegrees and indegrees), as well as the indirect ties with other actors.
     The two basic prominence classes:

      – Centrality: Actor has high involvement in many relations, regardless of send/receive
        directionality (volume of activity)
      – Prestige: Actor receives many directed ties, but initiates few relations
        (popularity > extensivity)
Source:   Wasserman&Faust "Social Network Analysis“ (W&F)
49                                                                                      © 2011 Alexander Troussov
Centrality: Eigenvector Centrality

     Eigenvector centrality was introduced by Phillip Bonacich in 1987
     “Google's workhorse search engine ranking algorithm, PageRank, is actually a variant on an
     SNA concept - Bonacich Power Centrality.
      – Bonacich (1987) hypothesized that someone's power in society depends on the power of his or her
        social contacts. Bonacich formalized this mathematically:
              ci = B(c1Ri1 + c2Ri2 + ... + cnRin) ,
        where ci is the person in question, B is the magnitude of the effect, and Rij is the strength of the
        relationship between the person in question, i, and each of the other people, j, under consideration.
        If B=1 , the formula becomes eigenvector centrality, of which PageRank is a variant. Now, Page, et
        al. (1998) do not cite Bonacich, I am not claiming that they stole the idea - I am merely stating that a
        social network analyst appears to me to have been the first to think up the concept”.
                          Solomon Messing http://www.stanford.edu/~messing/RforSNA.html




50                                                                                                © 2011 Alexander Troussov
Centrality and the network flow methods



     Most of the centrality measurement are based on the network flow process, “that focuses on
     the outcomes for nodes in a network where something is flowing from node to node across
     the edges” (Borgatti and Everett, M. 2006 ]
     We interpret this “something” as a relevancy measure; for instance, the initial seed input
     value which shows nodes of interest in the network. Propagating the relevancy measure
     through outgoing links allows us to compute the relevancy measure for other network nodes
     and dynamically rank these nodes according to the relevancy measures.
     The same paradigm could be used to address the centrality measurements in social network
     analysis. Centralisation of the network can be achieved when we assume that all the nodes
     are equally important, and iteratively recompute the relevancy measure based on the
     connections between nodes.




51                                                                                 © 2011 Alexander Troussov
Master Equation                            Numerical Solution



     Bonacich Power Centrality, Eigenvector Centrality, Google’s PageRank

      – “Google's workhorse search engine ranking algorithm, PageRank, is actually a variant on
        an SNA concept - Bonacich Power Centrality. Bonacich (1987) hypothesized that
        someone's power in society depends on the power of his or her social contacts.
        Bonacich formalized this mathematically:
             ci = B(c1Ri1 + c2Ri2 + ... + cnRin) ,
        where ci is the person in question, B is the magnitude of the effect, and Rij is the
        strength of the relationship between the person in question, i, and each of the other
        people, j, under consideration.
        If B=1 , the formula becomes eigenvector centrality, of which PageRank is a variant.
        Now, Page, et al. (1998) do not cite Bonacich, I am not claiming that they stole the idea -
        I am merely stating that a social network analyst appears to me to have been the first to
        think up the concept”.
                           Solomon Messing http://www.stanford.edu/~messing/RforSNA.html




52                                                                                   © 2011 Alexander Troussov
Master Equation                    Numerical Solution




                                                       Computation




     Master equation easily leads us to a numerical solution




53                                                               © 2011 Alexander Troussov
It is great to have “the right master equation”!
What is the shape of a hanging chain?

          – What is the shape of a hanging chain when supported at its ends
            and acted on only by its own weight?

                   Plotting geometric arrangements and forces acting on small
                   segments of the chain
                   Integrating the results




54                                                                              © 2011 Alexander Troussov
It is great to have “the right master equation”!
What is the shape of a hanging chain?

      What is the shape of a hanging chain when
       supported at its ends and acted on only by
       its own weight?
          • Galileo: “This chain will assume the form
            of a parabola”
                y=x2
            Plotting geometric arrangements and forces acting on small
            segments of the chain
            Integrating the results




55                                                                       © 2011 Alexander Troussov
It is great to have “the right master equation”!
What is the shape of a hanging chain?

             What is the shape of a hanging chain when
              supported at its ends and acted on only by
              its own weight?
                 • Galileo: “This chain will assume the form
                   of a parabola”
                       y=x2
                 • But the shape is different:
                       y = (a / 2) ( ex/a + e-x/a )
                   which was established later by applying
                   calculus
                       Plotting geometric arrangements and forces acting on small
                       segments of the chain
                       Integrating the results


     ." In 1669, Jungius disproved Galileo's claim that the curve of a chain
         hanging under gravity would be a parabola (MacTutor Archive). The
         curve is also called the alysoid and chainette. The equation was
         obtained by Leibniz, Huygens, and Johann Bernoulli in 1691 in              Leibniz's solution is on the left.
         response to a challenge by Jakob Bernoulli”.                               Huygen's illustation is on the right.
                                 http://mathworld.wolfram.com/Catenary.html
56                                                                                                          © 2011 Alexander Troussov
“Plotting geometric arrangements and forces acting on small segments” evolved into
       – Finite difference method
            • In mathematics, finite-difference methods are numerical methods for approximating
              the solutions to differential equations using finite difference equations to approximate
              derivatives.
       – Stencil
            • In mathematics, especially the areas of numerical analysis concentrating on the
              numerical solution of partial differential equations, a stencil is a geometric
              arrangement of a nodal group that relate to the point of interest by using a numerical
              approximation routine. Stencils are the basis for many algorithms to numerically
              solve partial differential equations.




57                                                                                      © 2011 Alexander Troussov
Numerical Solution                                NO Master Equation



     “Integrating” evolved into …
       – Well, in financial mathematics solutions are tuned on “stencils”.
         Numerical solutions are known.
              Master equation is not known,
              and is not interesting to know.
     “Master equation is not known” – this is ok.
      – But we need to be aware about emergency effects in complex systems:
        learning how to do something right in a small scale, doesn’t necessarily imply that we’ll
        do right things in a bigger scale




58                                                                                    © 2011 Alexander Troussov
Leibniz, Huygens, and Johann Bernoulli knew geometry and mechanics. We don't know
     "geometry" and "mechanics” of techno-social systems (and we don’t even know "geometry"
     and "mechanics” of semantic network, social networks, …)
     but we can create small "nodal arrangements" modeling multidimensional networks (for
     instance, folksonomies)
     Apply known and novel numerical algorithms and utilize state of the art knowledge to decide
     which algorithms provides better results.
     The next step - to check if good properties of the numerical solutions on the micro-level hold
     true on the mezzo-level




Source: Troussov at MITACS Workshop in Vancouver, Canada, 2010

59                                                                                    © 2011 Alexander Troussov
Recommender systems and global/local ranking



     Link analysis is frequently employed for ranking and navigation
     Graph-based recommender systems should recommend
           “Important” objects (nodes, links, subgraphs)
     which are also are
        – Close enough to the initial points of interests (query, focus, initial seed)
           (for instance, in physical space)
     Global ranking ~ PageRank
     Breadth first search (BFS) ? Local Ranking !?




     Recommending a suitable restaurant near the NY 9th avenue (next slide)
     or the music you might like, the advertisement you should see, etc

60                                                                                       © 2011 Alexander Troussov
Graphics:   http://strangemaps.wordpress.com/2007/02/07/72-the-world-as-seen-from-new-yorks-9th-avenue/

61                                                                                                        © 2011 Alexander Troussov
Global Ranking (like Google’s PageRank) –
a view on the network from external point - modern, “Copernican” approach




Source: NOAA

62                                                             © 2011 Alexander Troussov
Local Ranking – is needed for recommenders – should rely on Ego-
centered Ptolemaic view (actually, Poly-Centered, see next slide)

                                                                                                      LOCAL RANKING
                                                                                                      Ego-centered or "personal“ networks
                                                                                                      provide an Ptolemaic views of their
                                                                                                      networks from the perspective of the
                                                                                                      persons (egos) at the centers of their
                                                                                                      network.




Graphics:   http://strangemaps.wordpress.com/2007/02/07/72-the-world-as-seen-from-new-yorks-9th-avenue/

63                                                                                                                              © 2011 Alexander Troussov
POLY-CENTRIC
Poly-Centric   In physical space – navigation
               is from one point to another.
               In applications to virtual spaces
               - navigation is not simply
               browsing from a single object
               to another, but by dealing with
               several objects at the same
               time .
               For instance, to get better
               results in Google we add
               terms, we remove terms, …
               To compute recommendation
               “Whom invite to the meeting”,
               one can start navigation from
               two objects representing the
               user whom recommendation is
               for and the meeting in question




64                             © 2011 Alexander Troussov
.




     Graph-based recommender systems should
     recommend
             “Important” objects (nodes)
     which are also located
             Close to the initial points of interests (query,
     initial seed)
     One of the leading approaches in recommenders is:
       Results of Global Ranking (Link analysis)
      are “filtered” according to their proximity to the query
     In this paper we introduce novel algorithms which could
     replace two step procedure mentioned above with one
     step:
              Local Ranking
     which simultaneously computes proximity and importance
65                                            © 2011 Alexander Troussov
Web and Communities

     Communities in Social Sciences: A tribe learning to survive, a group of engineers working on similar
     problems, …
     Communities in computer sciences - any empirically found group of people




     Recent advances in digital technologies invite consideration of organizing as a process that is
     accomplished by global, flexible, adaptive, and ad hoc networks that can be created, maintained,
     dissolved, and reconstituted with remarkable alacrity”.
                                   Prof. N. Contractor




66                                                                                              © 2011 Alexander Troussov
Community detection … but What is a Community?



     Are you Russian? Yes. Are you Irish? Yes. Are you mathematician? Yes. Are you
     practitioner? Yes.
       – Communities easily overlap, multiple membership and fuzzy belongings
     At the same time, some communities SHOULD be kept separate
       – Remember “Strange Case of Dr Jekyll and Mr Hyde” (Robert Louis Stevenson, 1886).
            • How Google had failed to understand an essential property of real-world
              social networks
            • So by testing their social service inside a single context (Google employees only),
              the developers failed to notice that in real life, people participate in multiple
              contexts (family, work, friends, etc) that they work actively to keep
              separate. The reasons for wanting to keep these groups separate can range from
              wanting to keep an illicit affair secret from your spouse to political activists in
              oppressive regimes wanting to keep certain connections secret from the
              government. Another important reason to keep our communities separate, is that
              we often play different roles - and communicate differently

               http://www.iq.harvard.edu/blog/netgov/2010/03/worlds_colliding.html


67                                                                                   © 2011 Alexander Troussov
New methods for community detection are needed



  Multiple membership
   – Are you Russian? Yes. Are you Irish? Yes. Are you mathematician? Yes. Are you
     practitioner? Yes. …
  Fuzzy-belongings
   – We don’t know the social structures behind on-line “communities”
     members of an on-line community don’t necessarily have the sense of identity as
     members of real-life social communities, on-line communities could be project teams or
     networks of knowledge, …
  High performance and scalability (agglomerative, local, …)
    – Clustering as simply partitioning is ruled out because of multimembership
    – Clustering as partitioning is not possible in real time for many business applications
          • IBM Intranet: 400K employee, 10K on-line communities (the biggest 23K
             members), ...
   Contextualisation of Community Detection
    – Collaborative filtering systems provide recommendations based on the detection of like-
      minded users. But the user of a techno-social system whom the prediction is for could be
      "Matematician", "Irish" etc., or a kind of Dr. Jeckyll / Mr. Hyde persons, etc.(see next
68    slide)                                                                          © 2011 Alexander Troussov
An example of clustering around a node using propagation




69                                                         © 2011 Alexander Troussov
70   © 2011 Alexander Troussov
Future work in local dynamic clustering



     Troussov et al “Vectorised Spreading Activation” 2010 theorize that the future development
     of spreading activation (SA) methods might be driven by
                              “physics-inspired”
                                     and
                          “logic-inspired” algorithms
      – SA algorithms have roots in numerical simulation of various physics phenomena,
        particularly by finite difference methods.
      – From the other hand, the iterative procedure of SA is essentially the same as the
        procedure that determines the new state of a cell in cellular automata such as Conway’s
        Game of Life. Although cellular automata usually perform on rectangular (cubic, etc.)
        grids, the extension to arbitrary networks is feasible.
        ~ Marker propagation, MajorClust, Chinese whispers graph clustering algorithm, …




71                                                                                 © 2011 Alexander Troussov
Conway's Game of Life




72                      © 2011 Alexander Troussov
Conway's Game of Life




73                      © 2011 Alexander Troussov
Conway's Game of Life




74                      © 2011 Alexander Troussov
Logic-inspired VSA



     Finite difference approximations to differential equations were one of precursors of cellular
     automata (Stephen Wolfram "A New Kind of Science") and of the method of spreading
     activation (Troussov et al 2009)
     Iterative computational procedures in cellular automata are the same as in SA.
     The identity of the computational procedures allows to develop VSA algorithms with hybrid
     operations over the components of the activation vector.
       – For instance, “physical” operations could be responsible for the propagation of the
         activation around the initial seeds, the level of the activation indicates the relevancy of
         the nodes to the initial seeds.
       – “Logical” operations could propagate markers, which indicate potential belongings of
         nodes to clusters.

     Such hybrid operations will combine ranking with clustering; and is
     computationally efficient on massive networks since the major time consuming operations –
     retrieval of nodes – serve both “physical” and “logical” operations. The clustering does not
     involve partitioning of the whole network.


75                                                                                      © 2011 Alexander Troussov
VSA & Marker propagation – combining ranking with clustering




                                                       My University




An Expert


                                                A topic
                                                I’m interested in

76                                                      © 2011 Alexander Troussov
VSA & Clustering (Cont.)




77                         © 2011 Alexander Troussov
VSA & Clustering (Cont.)




78                         © 2011 Alexander Troussov
VSA & Clustering (Cont.)




79                         © 2011 Alexander Troussov
My University




     An Expert




                 A topic
                 I’m interested in



80                 © 2011 Alexander Troussov
Tasks / Methods



 Various terminology in various domains (for instance, from the point of view of IM many tasks falls into the
 category of hidden knowledge discovery)


Multidimensional network              Techno-Social Systems                 Networks Theory and Graph
point of view (A.T.):                 tasks                                 Theory terminology
                                      Recommender Systems                   Random walks
Centralisation                        PageRank etc                          Eigenvector centrality
                                      Expertise location


                                      Recommender systems                   Motifs
Local topology                        Link prediction


Ad hoc generalisation across          Expertise location                    Clustering
dimensions                            Recommender Systems




                                                                                              © 2011 Alexander Troussov
Tasks



     Avenues to deep socio-semantic analytics and the possibility of high-
     quality functionalities for techno-social systems (like recommending people to
     invite into your social network) hinge on the availability of engines which are able
       – to provide hidden knowledge discovery like
              • Structural importance of nodes
              • discovering a new relation in a network
                      that based on the strength of multiple connectivity between the nodes
                      of a social network one can conclude
                      that Dr. Jekyll is related to Mr. Hide),
              • provide ad hoc generalisation across dimensions.
              • For instance, the ability to detect that a particular person might serve as an
                 representative of a community or as an expert on a particular topic (the example
                 of such generalisation is the expression frequently attributed to Louis XIV "L'e'tat
                 s'est moi (I'm the State).")




82                                                                                      © 2011 Alexander Troussov
“Three steps away”   ?


      John B.            Axel P.      Dan B.          Tim B.




                         Why recommender decided
                         that this three steps away
                         connection is a strong
                         connection?
83
 83                                                       © 2011 Alexander Troussov
John and Tim –
 Recommender computes that this is a
 strong connection because of
 multiple ways of connections

 Shortest Path vs. Volume of traffic
                                       Friends-of-Friends




                                                            Interest
Workplace




84
 84                                                              © 2011 Alexander Troussov
John and Derek

Recommender computes
that such type of
connectivity is a weak
connection




85
 85                      © 2011 Alexander Troussov
Tasks: Generalisation Across Domains - Whom is Claudia connected with?

       All of these people

                              Dirk



                             Martin



          Claudia            Elaine         Researcher



                             John



                             Hanna


86                                                          © 2011 Alexander Troussov
Ranking




              2




          1       3




87                    © 2011 Alexander Troussov
Ranking
              3




          1       2




88                    © 2011 Alexander Troussov
Ranking




          1       2




              …
              …

89                    © 2011 Alexander Troussov
Nepomuk Recommender



     NEPOMUK (Networked Environment for Personalized, Ontology-based Management of Unified
     Knowledge) is an open-source software specification that is concerned with the development of a social
     semantic desktop that enriches and interconnects data from different desktop applications using semantic
     metadata stored as RDF.
     Initially, it was developed in the EU 6th framework integrated project Nepomuk (2006-2008) - 17 million
     Euros, of which 11.5 million was funded by the European Union




90                                                                                              © 2011 Alexander Troussov
Nepomuk Recommender (Cont.)



     Troussov et al “Social Context as Machine Processable Knowledge” presented the
     architecture of the hybrid recommender system in the activity centric environment Nepomuk-
     Simple (EU 6th Framework Project NEPOMUK).
     “Real” desktops usually have piles of things on them where the users (consciously or
     unconsciously) grouped together items which are related to each other or to a task. The so
     called “Pile” UI, used in the Nepomuk-Simple imitates this type of data and metadata
     organisation which helps to avoid premature categorisation and reduces the retention of
     useless documents.
     Metadata describing the user data are stored in the Nepomuk personal information
     management ontology (PIMO). Proper recommendations, such as recommendation of
     additional items to add to the pile, apparently should be based on the PIMO, on the textual
     content of the items in the pile. Although methods of natural language processing for
     information retrieval could be useful, the most important type of textual processing are those
     which allows to related concepts in PIMO to the processed texts. Since PIMO changes over
     the time, this type of natural language processing can’t be performed as preprocessing of all
     textual context related to the user. Hybrid recommendation needs on-the fly textual
     processing with the ability to aggregate the current instantiation of PIMO with the results of
     textual processing.

91                                                                                    © 2011 Alexander Troussov
Nepomuk



     Representing and modeling this ontology as a multidimensional network allows to augment
     the ontology on the fly by new information, such as the “semantic” content of the textual
     information in user documents. Recommendations in the Nepomuk-Simple are computed on
     the fly by graph-based methods performing in the unified multidimensional network of
     concepts from the personal information management ontology augmented with concepts
     extracted from the documents pertaining to the activity in question.
     Troussov et al. 2008 classify Nepomuk-Simple recommendations into two major types.
        – The first type of recommendations is recommendation of the additional items to the
           pile, when the user is working on an activity.
        – The second type of recommendations arises, for instance, when the user is browsing
           Web; the Nepomuk-Simple can recommend that current resource might be relevant to
           one or more activities performed by the user. In both cases there is a need to operate
           with Clouds (fuzzy sets of PIMO nodes): Clouds describe topicality of documents in
           terms of PIMO, the pile itself is a Cloud.




92                                                                                  © 2011 Alexander Troussov
Pile UI




93        © 2011 Alexander Troussov
Nepomuk use case: activity management


                           A user started to work on a new project CID.
                           Using the Nepomuk SSD, she collects a “pile” of
                           resources she needs while working on the project:
                                MS-Word documents, contacts, etc
                           by drag-and-dropping resources from her desktop,
                           by linking resources from e-mail (Mozilla
                           Thunderbird) and web browser (Firefox)
                           applications.




94                                                               © 2011 Alexander Troussov
Nepomuk use case: activity management using IBM recommender
codenamed “Galaxy”


              Galaxy (IBM hybrid recommender)
              analyses the pile content and linkage
              structure
                as a multidimensional network of concepts
                extracted from documents and links between
                concepts, projects, project participants,
                 meetings, document authors, … .
              and provides handy recommendations of
              resources she might possibly need




95                                                           © 2011 Alexander Troussov
Nepomuk use case: activity management




     Galaxy can spot what the user might miss:
     “This web page might be relevant to your CID
     activity”




96                                                  © 2011 Alexander Troussov
Thank you !




              © 2011 Alexander Troussov

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (9)

Aist academic writing
Aist academic writingAist academic writing
Aist academic writing
 
Zizka aimsa 2012
Zizka aimsa 2012Zizka aimsa 2012
Zizka aimsa 2012
 
семинар Spb ling_v3
семинар Spb ling_v3семинар Spb ling_v3
семинар Spb ling_v3
 
Rule b platf
Rule b platfRule b platf
Rule b platf
 
Presentation
PresentationPresentation
Presentation
 
Bonch-Osmolovskaya 3.3.2012
Bonch-Osmolovskaya 3.3.2012Bonch-Osmolovskaya 3.3.2012
Bonch-Osmolovskaya 3.3.2012
 
Ponomareva
PonomarevaPonomareva
Ponomareva
 
Nlp seminar.kolomiyets.dec.2013
Nlp seminar.kolomiyets.dec.2013Nlp seminar.kolomiyets.dec.2013
Nlp seminar.kolomiyets.dec.2013
 
Cross domainsc new
Cross domainsc newCross domainsc new
Cross domainsc new
 

Ähnlich wie 2011 04 troussov_graph_basedmethods-weakknowledge

Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
 
Finding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKSFinding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKSESCOM
 
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Marko Rodriguez
 
Swoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic SearchSwoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic SearchIDES Editor
 
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic NetworksRostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic NetworksWitology
 
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITIONNEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITIONaciijournal
 
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
Neural Model-Applying Network (Neuman): A New Basis for Computational CognitionNeural Model-Applying Network (Neuman): A New Basis for Computational Cognition
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognitionaciijournal
 
Experiences on integrating explicit knowledge on information access tools in ...
Experiences on integrating explicit knowledge on information access tools in ...Experiences on integrating explicit knowledge on information access tools in ...
Experiences on integrating explicit knowledge on information access tools in ...Manuel de la Villa
 
COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...
COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...
COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...ijwmn
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospectsGuus Schreiber
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationMohammed Bennamoun
 
AI Chapter VIIProblem Solving Using Searching .pptx
AI Chapter VIIProblem Solving Using Searching .pptxAI Chapter VIIProblem Solving Using Searching .pptx
AI Chapter VIIProblem Solving Using Searching .pptxwekineheshete
 
EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...
EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...
EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...ssuser2624f71
 

Ähnlich wie 2011 04 troussov_graph_basedmethods-weakknowledge (20)

TOPOLOGY
 TOPOLOGY  TOPOLOGY
TOPOLOGY
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Finding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKSFinding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKS
 
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
 
2 prayla
2 prayla2 prayla
2 prayla
 
2 prayla
2 prayla2 prayla
2 prayla
 
Ontology development
Ontology developmentOntology development
Ontology development
 
Swoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic SearchSwoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic Search
 
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic NetworksRostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
 
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITIONNEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
 
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
Neural Model-Applying Network (Neuman): A New Basis for Computational CognitionNeural Model-Applying Network (Neuman): A New Basis for Computational Cognition
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
 
Experiences on integrating explicit knowledge on information access tools in ...
Experiences on integrating explicit knowledge on information access tools in ...Experiences on integrating explicit knowledge on information access tools in ...
Experiences on integrating explicit knowledge on information access tools in ...
 
Ijetcas14 639
Ijetcas14 639Ijetcas14 639
Ijetcas14 639
 
COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...
COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...
COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospects
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
AI Chapter VIIProblem Solving Using Searching .pptx
AI Chapter VIIProblem Solving Using Searching .pptxAI Chapter VIIProblem Solving Using Searching .pptx
AI Chapter VIIProblem Solving Using Searching .pptx
 
EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...
EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...
EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...
 

Mehr von Natalia Ostapuk

Mehr von Natalia Ostapuk (20)

Gromov
GromovGromov
Gromov
 
Aist academic writing
Aist academic writingAist academic writing
Aist academic writing
 
Tomita одесса
Tomita одессаTomita одесса
Tomita одесса
 
Mt engine on nlp semniar
Mt engine on nlp semniarMt engine on nlp semniar
Mt engine on nlp semniar
 
Tomita 4марта
Tomita 4мартаTomita 4марта
Tomita 4марта
 
Konyushkova
KonyushkovaKonyushkova
Konyushkova
 
Braslavsky 13.12.12
Braslavsky 13.12.12Braslavsky 13.12.12
Braslavsky 13.12.12
 
Клышинский 8.12
Клышинский 8.12Клышинский 8.12
Клышинский 8.12
 
Zizka synasc 2012
Zizka synasc 2012Zizka synasc 2012
Zizka synasc 2012
 
Zizka immm 2012
Zizka immm 2012Zizka immm 2012
Zizka immm 2012
 
Analysis by-variants
Analysis by-variantsAnalysis by-variants
Analysis by-variants
 
место онтологий в современной инженерии на примере Iso 15926 v1
место онтологий в современной инженерии на примере Iso 15926 v1место онтологий в современной инженерии на примере Iso 15926 v1
место онтологий в современной инженерии на примере Iso 15926 v1
 
Text mining
Text miningText mining
Text mining
 
Additional2
Additional2Additional2
Additional2
 
Additional1
Additional1Additional1
Additional1
 
Seminar1
Seminar1Seminar1
Seminar1
 
2011 04 troussov_graph_basedmethods-weakknowledge
2011 04 troussov_graph_basedmethods-weakknowledge2011 04 troussov_graph_basedmethods-weakknowledge
2011 04 troussov_graph_basedmethods-weakknowledge
 
Angelii rus
Angelii rusAngelii rus
Angelii rus
 
17.03 большакова
17.03 большакова17.03 большакова
17.03 большакова
 
Авиком
АвикомАвиком
Авиком
 

Kürzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

2011 04 troussov_graph_basedmethods-weakknowledge

  • 1. Alexander Troussov, Ph.D., IBM Dublin Software Lab 16th of April 2011, Mathlingvo Seminar, St.Petersburg State University, Russia Graph-based methods to exploit “weak” knowledge © 2011 Alexander Troussov
  • 2. About AT IBM Ireland Center for Advanced Studies - Chief Scientist IBM LanguageWare group – the Architect National Geophysical Data Center, Boulder, CO, USA - Visiting scientist – Fuzzy logic based search engine for search in large databases when exact parameters of search are hard to define Observatoire de la Côte d’Azur, Nice, France – Visiting scientist – numerical simulation in stochastic physics Institute of Physics of the Earth (Russian Academy of Sciences) and the International Institute for Earthquake Prediction Theory and Mathematical Geophysics, Moscow, Russia - Lead Researcher – R&D in geophysics and geoinformatics System programming at the Institute of Precise Mechanics, Moscow PhD in Mathematics from Lomonosov Moscow State University 2 © 2011 Alexander Troussov
  • 3. Natural Language Understanding is Inferencing (?) From computational point of view natural language understanding is inferencing – Text which mentions Malahide is probably about Canada (??) Malahide (Canada 2006 Census population 8,828) is a township in Elgin County, Ontario, Canada Source: Troussov et al. MITACS, Canada, 2010 3 © 2011 Alexander Troussov
  • 4. Inferencing Terms are ambiguous, and our knowledge is never “the truth, the whole truth, and nothing but the truth” – Malahide, Co. Dublin – Malahide is a township in Elgin County, Ontario, Canada. – Paradis Gisenyi Malahide is a hotel in Rwanda Solution (Troussov et al. MITACS, Canada, 2010 ): propagation from multiple concepts, for instance, the initial seed for the activation propagation starts at two nodes in a geographical taxonomy: Malahide (Ontario) and Malahide (Co. Dublin) as well as from other concepts mentioned in the text • Text which mentions Malahide and Europe – is a little bit more likely to be about Ireland than about Canada • Text which mentions Malahide and Clontarf – is more likely to be about Ireland than about Canada • … • Cohesive coherent text which mentions: Malahide, Mulhuddart, Lansdowne, Clontarf, Donabate - is almost for sure about Dublin 4 © 2011 Alexander Troussov
  • 5. Knowledge, Lexico-Semantic Resource Text Relevancy 5 © 2011 Alexander Troussov
  • 6. Text – Semantic Network NETWORK OF CONCEPTS Finding “focus” concept Mapping of term mentions to concepts . Mention Mention Mention Mention TEXT © 2011 Alexander Troussov
  • 7. NLU as inferencing The concept of a car is relevant to a text. Car IS-A “on-land travel” (?) Therefore “on-land travel” is somewhat relevant to the text, … 7 © 2011 Alexander Troussov
  • 8. Text – Semantic Network NETWORK OF CONCEPTS Finding “focus” concept Mapping of term mentions to concepts . Mention Mention Mention Mention TEXT © 2011 Alexander Troussov
  • 9. Demo – 2 1 Spreading Activation.pdf 9 © 2011 Alexander Troussov
  • 10. Agenda Introduction Building Semantic Model SA Research Challenges – Why SA – Relayability of inferencing – What is the purpose of graph operations Centrality, network flow methods Zoo of algorithms Nepomuk Recommender © 2011 Alexander Troussov
  • 11. Text – Semantic Network NETWORK OF CONCEPTS Finding “focus” concept Mapping of term mentions to concepts . Mention Mention Mention Mention TEXT © 2011 Alexander Troussov
  • 12. Spreading Activation Methods © 2011 Alexander Troussov
  • 13. There is an increased need for a new generic and formal understanding of spreading activation as a class of algorithms rather than a particular algorithm with many parameters Spreading activation (also known as spread of activation) is a method for searching associative networks, neural networks or semantic networks. The method is based on the idea of quickly spreading an associative relevancy measure over the network. Our goal is to give an expanded introduction to the method. We will demonstrate and describe in sufficient detail that this method can be applied to very diverse problems and applications. We present the method as a general framework. First we will present this method as a very general class of algorithms on large (or very large) so-called multidimensional networks which will serve a mathematical model. Source: Troussov, Levner, Bogdan, Judge, Botvich “Spreading activation methods” 13 © 2011 Alexander Troussov
  • 14. We present spreading activation in a generic form, as a set of methods suitable for mining multidimensional networks with oriented weighted links. These graphmining methods might produce results similar to those which might be achieved by soft clustering and fuzzy inferencing. The input object is a function on nodes of the network, and the spread of activation is a technique which provides “spreading” of this function through the network links. The result of the spreading activation is a new function on the nodes. The properties of that function strongly depend on the original function and the parameters of the spreading activation. For instance, when the underlying network is a network of ontological concepts, parameters governing spread might be chosen in such a way that allows “smoothing” of the original function and interpreting the resulting function as “conceptual” summaries of the initial non-zero valued nodes. 14 © 2011 Alexander Troussov
  • 15. Origin of Spreading Activation Methods In neurophysiology interactions between neurons is modeled by way of activation which propagates from one neuron to another via connections called synapses to transmit information using chemical signals. The first spreading activation models were used in cognitive psychology to model this processes of memory retrieval (Collins, A.M. & Loftus, E.F., 1975; Anderson, J.,1983). This framework was later exploited in Artificial Intelligence (AI) as a processing framework for semantic networks and ontologies, and applied to Information Retrieval (Crestani, F., 1997; Aleman-Meza, Halaschek, Arpinar, & Sheth, 2003; Rocha, C, Schwabe, D. & Poggi de Aragao, M., 2004; …) as the result of direct transfer of information retrieval ideas from cognitive sciences to AI. 15 © 2011 Alexander Troussov
  • 16. Notation A multidimensional network can be modeled as a directed graph, which is a pair G = (V,E) where V – is the set of vertices vi E – is the set of edges ej (although in oriented graphs edges are referred to as arcs) init: E → V – is the mapping which provides initial nodes for arcs term: E → V – is the mapping which provides terminal nodes for arcs imp – is importance value of arcs and nodes. For instance, imp(v) where the node v is a geographical location, might be the population. Imp(e) number of phone calls from person init(e) to person term(e). w – “weights”, for instance, the sigmoidal function of imp. w(ej)=0 means that effectively arc ej is ignored w(ej)=1 means that activation of init(ej) strongly affects the activation of term(ej). For instance, when the nodes represent “words”, synonym links might be assigned the value 1. F(E) – is the “activation” function, usually a real valued function on nodes of the network. 16 © 2011 Alexander Troussov
  • 17. Generic description of spreading activation methods (SAM) framework 1. Initialisation Sets the parameters of the algorithm, network, and initial F(E) as a list of non-zero valued nodes V n 2. Iterations (each iteration is one pulse of SAM) – a. List Expansion the list is expanded to include neighbors (including both neighbors following outgoing links, and neighbors which have links to the nodes in the list). Newly added nodes receive a zero valued level of activation – b. Recomputation the value at each node in the list is recomputed based on the values of the function on nodes which have links to the given node and types of connections – c. List Purging The list is purged - we exclude the nodes with the values less than a threshold. – d. Conditions Check To Break Iterations like maximum number of iterations to be performed. 3. Output The list of nodes (value of the function after spread of activation) ranked according F values. 17 © 2011 Alexander Troussov
  • 18. Generic description of recomputation phase We have the list of nodes V n . 1. Input/Output Through Links Computation. – For each node v we compute the input signal to each arc e, such that init(e)=v. When the signal (“activation”) passes through a link e, the activation usually experiences decay by a factor w(e) 2. Input/Output of Node Activation – Before the pulse, the node v has the activation level F(v). • Through incoming links v get more activation, By dissipating the activation through outgoing links, the node v might lose activation. 3. Computation of the New Level of Activation – A new value F(v) is computed based on F(v), Input (v), and Output (v) 18 © 2011 Alexander Troussov
  • 19. Generic description of recomputation phase 1. Input/Output Through Links Computation. For each node v we compute the input signal to each arc e, such that init(e)=v. This computation can be based on the value F(v), the outdegree of a node etc. For instance, if the node v has n outgoing arcs of the same type, each arc e might get input signal: I (e) = F(init(e)) · (1 / outdegree(v)**beta ) where beta might be equal to 1. It could be also less than one, in which case the node v will propagate more activation to its neighbors than it has. When the signal (“activation”) passes through a link e, the activation usually experiences decay by a factor w(e): O (e) = I(e) · w(e) 19 © 2011 Alexander Troussov
  • 20. Generic description of input/output phase 2. Input/Output of Node Activation Before the pulse, the node v has the activation level F(v). Through incoming links v get more activation: Input(v) = Σ O(e) for all links e such that init(e) ∈V n, term(e) = v. By dissipating the activation through outgoing links, the node v might lose activation: Output(v) = Σ I(e) for all links e such that init(e) = v, term(e) ∈V n 20 © 2011 Alexander Troussov
  • 21. Generic description of recomputation phase 3. Computation of the New Level of Activation A new value F(v) is computed based on F(v), Input (v), and Output (v), for example Fnew(v) = F(v) + Input (v) 21 © 2011 Alexander Troussov
  • 22. SAM and Methods of Numerical Simulation in Physics Spreading activation algorithms were introduced in 1990s; however the same iterative methods were used long before in numerical simulation in physics, mechanics, chemistry and engineering sciences. The major distinctions of these algorithms from what is called now as spreading activation are: – a) in physics – such algorithms usually work on a regular mesh (so that the local topology of the graph is encoded into formulas of the recomputation stage) – b) in physics – initial conditions, or initial activation – are usually assigned to all nodes on the mesh; and the use of algorithms for efficient graph traversal is not needed. For instance, steps 2a (List expansion) and 2b (List Purging) in the generic description of SAM framework might be skipped. For instance, one dimensional heat transfer equations might be numerically simulated on a one-dimensional mesh, by iterative methods. On each iteration recomputation stage is based on the formula below: Fnew (v) = ( F(RightNeighbor(v)) + F(LeftNeighbor(v)) ) / 2 Using a different formula, one can simulate the behavior of an oscillating string (although this will require storing tree values at each node - position, mass and velocity of the material point corresponding to the node). © 2011 Alexander Troussov
  • 23. SAM and Methods of Numerical Simulation in Physics Using the same iterative algorithm, with one set of parameters one can emulate heat transfer; with another set of parameters the same algorithm will show us the behavior of oscillating strings. But the phenomena of heat propagation and string oscillation are quite different (for instance, heat propagation might lead to “thermal death” - the state of equilibrium where the level of activation is the same for all nodes, while oscillation might continue forever). Our illustration concern only basics, while real modeling might be much more complicated, for instance, hear transfer might lead to combustion, where after reaching some level of activation a node generates more “heat” than it gets from neighboring nodes. 23 © 2011 Alexander Troussov
  • 24. 24 © 2011 Alexander Troussov
  • 25. Spreading Activation as a Graphmining Technique The technique of SAM is quite polymorphic. On this slide we interpret the results of spreading activation in terms of graph mining. – First of all, one can think that after running SAM the most activated nodes will be those nodes, which get the activation from multiple sources, or, in other words, those nodes which minimize the “distance” to the nodes which were initially activated. Therefore these nodes might be considered as potential centroids of strong clusters induced by the initial activation. Since partitioning of the nodes according to these clusters is not immediately available (and is not needed in many applications), SAM algorithms might be considered as methods of soft clustering. – On the other hand, the most activated nodes are those nodes, which are connected to the initial conditions by particular types of directed links (arcs with large weights). Therefore we might consider SAM as an efficient scheme for computing fuzzy inferencing. For such applications replacing a single valued function F by a vector function might be useful. We conclude by noting that SAM algorithms might be used for soft clustering and fuzzy inferencing on networks. 25 © 2011 Alexander Troussov
  • 26. Γαλλία People Παρίσι Ναπολέων Αλέξανδρος Geographical artifacts Relations • Friends • Part of, Instance of, Subcluss • Created 26 © 2011 Alexander Troussov
  • 27. France Russia Paris Moscow Napoleon Alexander Borodino Kutuzov Meeting: Battle of Austerlitz Meeting: Battle of Borodino Project: Invasion of Russia 27 © 2011 Alexander Troussov
  • 28. Diagram on the previous slide … What it represents? How it can be used? 28 © 2011 Alexander Troussov
  • 29. France Russia Paris Moscow Napoleon Alexander Borodino Kutuzov Meeting: Battle of Austerlitz Meeting: Battle of Borodino How this diagram could be used? 1.Network flow process could show the nodes most relevant to the pair “Napoleon” & “Meeting” - Selection WHO – whom to invite Project: - Other nodes – explain recommendations Invasion of Russia 2.When Napoleon opens email or a web page containing W&P he will be advised that the content of this resource is relevant to his project “Invasion of Russia”0 29 © 2011 Alexander Troussov
  • 30. Diagram on the previous slide … What it represents? Data from Facebook, data from Napoleon’s Lotus Notes calendar, structure of a Wiki, network of collocations or relations between the entities in W&P, … – The proliferation of Web 2.0 and Enterprise 2.0 technologies has lead to the emergence of massive networks connecting people and various digital artifacts. These networks can be treated as a “weak” knowledge, which nevertheless might be used recommendations and even for such traditional applications as knowledge-based text processing Or instantiation of an ontology related to W&P by Leo Tolstoy – In which case we would probably know that Napoleon is emperor of France, Paris is the capital (not instantiation of a subclass) of France, etc. Ontology provides conceptualization, allow inferencing, but these advantages per se are useless without tedious manual work to encode the rules how to use this additional knowledge. While the knowledge encoded in the topology of the multidimensional network is ready to use provided that methods are tolerant to errors and inconsistencies in data - i.e. the methods are methods of “soft mathematic” – fuzzy inferencing, soft clustering, … 30 © 2011 Alexander Troussov
  • 31. Social Context = Knowledge ? A New Mathematical Model of Horse Racing Assume, without the loss of generality, that each horse in the horse racing is modelled by a wooden ball of radius Ri. = a ball ? ☺ 31 © 2011 Alexander Troussov
  • 32. Representing social context as a knowledge allows us to benefit from the experience of knowledge based applications. 32 © 2011 Alexander Troussov
  • 33. For instance, the social context modeled as a network is not much different from semantic networks which are formed from concepts represented in ontologies. And it is possible to use such networks for knowledge based text processing. Representing social context as knowledge allows us to draw experience from such mature R&D area as knowledge-based text processing 33 © 2011 Alexander Troussov
  • 34. How to model the social context As multidimensional networks – The primary source - network models of instantiations of techno-social systems As a “Knowledge” – represented as objects, clauses, XML, graphs, some combination of these 34 © 2011 Alexander Troussov
  • 35. The primary source – network models of techno-social systems Invited Joined Log-files of Techno-Social systems (like Created Facebook or IBM’s Lotus Connections) keep track about who did what. Triples could be aggregated into a network. 35 © 2011 Alexander Troussov
  • 36. Examples of Graph Models: Folksonomies: – Tripartite Hypergraph Social bookmarking systems (Del.icio.us, …) – Where to keep my bookmarks? – Users (actors), resources, tags In social bookmarking systems users describe bookmarks by keywords called tags. The structure behind these social systems, called folksonomies, can be viewed as a tripartite hypergraph of actors, tag and resource nodes. – Three types of citizens of the first class citizens, and hyperplanes – If hyperplanes are made from rubber, they could be schinked to a node, so the hyperplanes will also be citizens of the first class Advantages of the network models (see next slide) – Extensibility – Easy of merge heterogeneous information Source: Hypergraphs: see Jäschke et al. "Logsonomy — A Search Engine Folksonomy" MediaICWSM 2008AAAI Press (2008) 36 © 2011 Alexander Troussov
  • 37. Inferencing – “Soft methods” could provide reliable inferencing For instance, the social context modeled as a network is not much different from semantic networks which are formed from concepts represented in ontologies. And it is possible to use such networks for knowledge based text processing. Representing social context as knowledge allows us to draw experience from such mature R&D area as knowledge-based text processing 37 © 2011 Alexander Troussov
  • 38. Natural Language Understanding is Inferencing (?) From computational point of view natural language understanding is inferencing – Text which mentions Malahide is probably about Canada (??) Malahide (Canada 2006 Census population 8,828) is a township in Elgin County, Ontario, Canada Source: Troussov et al. MITACS, Canada, 2010 38 © 2011 Alexander Troussov
  • 39. Inferencing Terms are ambiguous, and our knowledge is never “the truth, the whole truth, and nothing but the truth” – Malahide, Co. Dublin – Malahide is a township in Elgin County, Ontario, Canada. – Paradis Gisenyi Malahide is a hotel in Rwanda Solution (Troussov et al. MITACS, Canada, 2010 ): propagation from multiple concepts, for instance, the initial seed for the activation propagation starts at two nodes in a geographical taxonomy: Malahide (Ontario) and Malahide (Co. Dublin) as well as from other concepts mentioned in the text • Text which mentions Malahide and Europe – is a little bit more likely to be about Ireland than about Canada • Text which mentions Malahide and Clontarf – is more likely to be about Ireland than about Canada • … • Cohesive coherent text which mentions: Malahide, Mulhuddart, Lansdowne, Clontarf, Donabate - is almost for sure about Dublin Such rapid “phase transition” from uncertainty to certainty is similar to the transition related to percolation threshold 39 © 2011 Alexander Troussov
  • 40. from Uncertainty to Certainty in Inferencing: phase transitions as a function of seed size in analogy to ones in percolation In (semantic) networks with high local density the reliability of inferencing from a single concept is almost never sufficient, reliability could be low when inferencing starts from a small number of seed concepts, but inferencing becomes very reliable at some level of the number of the initial seed concepts (which could be explained by combinatorics) Reliability of inferencing 40 Number of nodes in the seed © 2011 Alexander Troussov
  • 41. And could be explained by combinatorics A graph showing the approximate probability of at least two people sharing a birthday amongst a certain number of people. In probability theory, the birthday problem, or birthday paradox, pertains to the probability that in a set of randomly chosen people some pair of them will have the same birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 366 (ignoring February 29 births). But perhaps counter-intuitively, 99% probability is reached with a mere 57 people, and 50% probability with 23 people. 41 © 2011 Alexander Troussov
  • 42. Simulation The network (such as a taxonomy of geographical locations) is the tree of 20,000 nodes. Text is modeled as a list of 100 terms each of which is ambiguous and could be mapped into 8 network nodes. When such mapping happens, we consider that the node (the geographical location represented by the node) could be relevant to the text. We are looking for clusters such as the groups of N nodes each of them is mentioned in the text and the graph distance between each pair of nodes in the cluster is less than three. Such graph structures have low probability of occurrence for small N (N=1 or 2), and their probability sharply decreases to zero for bigger N; correspondingly, our certainty that the graph structure signifies the topicality of the text increases to 1.0 – Text which mentions Malahide, Mulhuddart, Lansdowne, Clontarf, Donabate - is almost for sure about Dublin (Ireland) Source: F. Darena and A. Troussov 2010 42 © 2011 Alexander Troussov
  • 43. Processes in Networks How we study the Earth? – By looking at the results of the propagation of waves through the Earth Propagation of seismic wave in the ground and the effect of presence of land mine Similarly, one can study the networks by network flow methods – introducing the processes where something is flowing from node to node across the edges © 2011 Alexander Troussov
  • 44. Processses Used goods- trail Money - walk Gossip - replication rather than transference (trails rather than walks) E-mail - diffusion by replication Attitudes - spread through replication rather than transfer Infection - spreads like gossip, but does not re-infect Packages - usually the shortest route possible Relevancy in semantic networks Trust - Shortest path or volume? 44 © 2011 Alexander Troussov
  • 45. 45 © 2011 Alexander Troussov
  • 46. we are talking about consumability of centrality measurements produced by network flow methods like these (DEMO) 46 © 2011 Alexander Troussov
  • 47. Key difference between SNA and other approaches to social science Social sciences usually have focus on attributes of individual actors 47 © 2011 Alexander Troussov
  • 48. Key difference between SNA and other approaches to social science SNA focus on relationships between actors “Social network analysis reflects a shift from the individualism common in the social sciences towards a structural analysis”. Garton et al. Studying Online Social Networks Structuralism is an approach to the human sciences that attempts to analyze a specific field (for instance, mythology) as a complex system of interrelated parts. лингвистс Романа Якобсона и Ник. Трубецкоj антрополог Леви-Стросс ~ Complex systems Sociogram: – Jacob Levy Moreno (1889-1974) was a Austrian-American leading psychiatrist and psychosociologist, thinker and educator, the founder of psychodrama, and the foremost pioneer of group psychotherapy. Among Moreno’s primary contributions to sociometrics was the sociogram. The sociogram is a method of representing individuals as points on graphs and using lines and arcs to represent the relationships between the individuals. Graphics from Prof. Hendrik Speck's tutorial at 5th Karlsruhe Symposium for Knowledge Management in Theory and Praxis, 2007 48 © 2011 Alexander Troussov
  • 49. Prominence The study of structural properties of networks and their interplay with the processes taking place on the network is one of the main problems in the last years in the field of complex network analysis A primary use of graph theory in social network analysis is to identify “important” actors. Centrality and prestige concepts seek to quantify graph theoretic ideas about an individual actor’s prominence within a network by summarizing structural relations among the graph nodes. An actor’s prominence reflects its greater visibility to the other network actors (an audience). An actor’s prominent location takes account of the direct sociometric choices made and choices received (outdegrees and indegrees), as well as the indirect ties with other actors. The two basic prominence classes: – Centrality: Actor has high involvement in many relations, regardless of send/receive directionality (volume of activity) – Prestige: Actor receives many directed ties, but initiates few relations (popularity > extensivity) Source: Wasserman&Faust "Social Network Analysis“ (W&F) 49 © 2011 Alexander Troussov
  • 50. Centrality: Eigenvector Centrality Eigenvector centrality was introduced by Phillip Bonacich in 1987 “Google's workhorse search engine ranking algorithm, PageRank, is actually a variant on an SNA concept - Bonacich Power Centrality. – Bonacich (1987) hypothesized that someone's power in society depends on the power of his or her social contacts. Bonacich formalized this mathematically: ci = B(c1Ri1 + c2Ri2 + ... + cnRin) , where ci is the person in question, B is the magnitude of the effect, and Rij is the strength of the relationship between the person in question, i, and each of the other people, j, under consideration. If B=1 , the formula becomes eigenvector centrality, of which PageRank is a variant. Now, Page, et al. (1998) do not cite Bonacich, I am not claiming that they stole the idea - I am merely stating that a social network analyst appears to me to have been the first to think up the concept”. Solomon Messing http://www.stanford.edu/~messing/RforSNA.html 50 © 2011 Alexander Troussov
  • 51. Centrality and the network flow methods Most of the centrality measurement are based on the network flow process, “that focuses on the outcomes for nodes in a network where something is flowing from node to node across the edges” (Borgatti and Everett, M. 2006 ] We interpret this “something” as a relevancy measure; for instance, the initial seed input value which shows nodes of interest in the network. Propagating the relevancy measure through outgoing links allows us to compute the relevancy measure for other network nodes and dynamically rank these nodes according to the relevancy measures. The same paradigm could be used to address the centrality measurements in social network analysis. Centralisation of the network can be achieved when we assume that all the nodes are equally important, and iteratively recompute the relevancy measure based on the connections between nodes. 51 © 2011 Alexander Troussov
  • 52. Master Equation Numerical Solution Bonacich Power Centrality, Eigenvector Centrality, Google’s PageRank – “Google's workhorse search engine ranking algorithm, PageRank, is actually a variant on an SNA concept - Bonacich Power Centrality. Bonacich (1987) hypothesized that someone's power in society depends on the power of his or her social contacts. Bonacich formalized this mathematically: ci = B(c1Ri1 + c2Ri2 + ... + cnRin) , where ci is the person in question, B is the magnitude of the effect, and Rij is the strength of the relationship between the person in question, i, and each of the other people, j, under consideration. If B=1 , the formula becomes eigenvector centrality, of which PageRank is a variant. Now, Page, et al. (1998) do not cite Bonacich, I am not claiming that they stole the idea - I am merely stating that a social network analyst appears to me to have been the first to think up the concept”. Solomon Messing http://www.stanford.edu/~messing/RforSNA.html 52 © 2011 Alexander Troussov
  • 53. Master Equation Numerical Solution Computation Master equation easily leads us to a numerical solution 53 © 2011 Alexander Troussov
  • 54. It is great to have “the right master equation”! What is the shape of a hanging chain? – What is the shape of a hanging chain when supported at its ends and acted on only by its own weight? Plotting geometric arrangements and forces acting on small segments of the chain Integrating the results 54 © 2011 Alexander Troussov
  • 55. It is great to have “the right master equation”! What is the shape of a hanging chain? What is the shape of a hanging chain when supported at its ends and acted on only by its own weight? • Galileo: “This chain will assume the form of a parabola” y=x2 Plotting geometric arrangements and forces acting on small segments of the chain Integrating the results 55 © 2011 Alexander Troussov
  • 56. It is great to have “the right master equation”! What is the shape of a hanging chain? What is the shape of a hanging chain when supported at its ends and acted on only by its own weight? • Galileo: “This chain will assume the form of a parabola” y=x2 • But the shape is different: y = (a / 2) ( ex/a + e-x/a ) which was established later by applying calculus Plotting geometric arrangements and forces acting on small segments of the chain Integrating the results ." In 1669, Jungius disproved Galileo's claim that the curve of a chain hanging under gravity would be a parabola (MacTutor Archive). The curve is also called the alysoid and chainette. The equation was obtained by Leibniz, Huygens, and Johann Bernoulli in 1691 in Leibniz's solution is on the left. response to a challenge by Jakob Bernoulli”. Huygen's illustation is on the right. http://mathworld.wolfram.com/Catenary.html 56 © 2011 Alexander Troussov
  • 57. “Plotting geometric arrangements and forces acting on small segments” evolved into – Finite difference method • In mathematics, finite-difference methods are numerical methods for approximating the solutions to differential equations using finite difference equations to approximate derivatives. – Stencil • In mathematics, especially the areas of numerical analysis concentrating on the numerical solution of partial differential equations, a stencil is a geometric arrangement of a nodal group that relate to the point of interest by using a numerical approximation routine. Stencils are the basis for many algorithms to numerically solve partial differential equations. 57 © 2011 Alexander Troussov
  • 58. Numerical Solution NO Master Equation “Integrating” evolved into … – Well, in financial mathematics solutions are tuned on “stencils”. Numerical solutions are known. Master equation is not known, and is not interesting to know. “Master equation is not known” – this is ok. – But we need to be aware about emergency effects in complex systems: learning how to do something right in a small scale, doesn’t necessarily imply that we’ll do right things in a bigger scale 58 © 2011 Alexander Troussov
  • 59. Leibniz, Huygens, and Johann Bernoulli knew geometry and mechanics. We don't know "geometry" and "mechanics” of techno-social systems (and we don’t even know "geometry" and "mechanics” of semantic network, social networks, …) but we can create small "nodal arrangements" modeling multidimensional networks (for instance, folksonomies) Apply known and novel numerical algorithms and utilize state of the art knowledge to decide which algorithms provides better results. The next step - to check if good properties of the numerical solutions on the micro-level hold true on the mezzo-level Source: Troussov at MITACS Workshop in Vancouver, Canada, 2010 59 © 2011 Alexander Troussov
  • 60. Recommender systems and global/local ranking Link analysis is frequently employed for ranking and navigation Graph-based recommender systems should recommend “Important” objects (nodes, links, subgraphs) which are also are – Close enough to the initial points of interests (query, focus, initial seed) (for instance, in physical space) Global ranking ~ PageRank Breadth first search (BFS) ? Local Ranking !? Recommending a suitable restaurant near the NY 9th avenue (next slide) or the music you might like, the advertisement you should see, etc 60 © 2011 Alexander Troussov
  • 61. Graphics: http://strangemaps.wordpress.com/2007/02/07/72-the-world-as-seen-from-new-yorks-9th-avenue/ 61 © 2011 Alexander Troussov
  • 62. Global Ranking (like Google’s PageRank) – a view on the network from external point - modern, “Copernican” approach Source: NOAA 62 © 2011 Alexander Troussov
  • 63. Local Ranking – is needed for recommenders – should rely on Ego- centered Ptolemaic view (actually, Poly-Centered, see next slide) LOCAL RANKING Ego-centered or "personal“ networks provide an Ptolemaic views of their networks from the perspective of the persons (egos) at the centers of their network. Graphics: http://strangemaps.wordpress.com/2007/02/07/72-the-world-as-seen-from-new-yorks-9th-avenue/ 63 © 2011 Alexander Troussov
  • 64. POLY-CENTRIC Poly-Centric In physical space – navigation is from one point to another. In applications to virtual spaces - navigation is not simply browsing from a single object to another, but by dealing with several objects at the same time . For instance, to get better results in Google we add terms, we remove terms, … To compute recommendation “Whom invite to the meeting”, one can start navigation from two objects representing the user whom recommendation is for and the meeting in question 64 © 2011 Alexander Troussov
  • 65. . Graph-based recommender systems should recommend “Important” objects (nodes) which are also located Close to the initial points of interests (query, initial seed) One of the leading approaches in recommenders is: Results of Global Ranking (Link analysis) are “filtered” according to their proximity to the query In this paper we introduce novel algorithms which could replace two step procedure mentioned above with one step: Local Ranking which simultaneously computes proximity and importance 65 © 2011 Alexander Troussov
  • 66. Web and Communities Communities in Social Sciences: A tribe learning to survive, a group of engineers working on similar problems, … Communities in computer sciences - any empirically found group of people Recent advances in digital technologies invite consideration of organizing as a process that is accomplished by global, flexible, adaptive, and ad hoc networks that can be created, maintained, dissolved, and reconstituted with remarkable alacrity”. Prof. N. Contractor 66 © 2011 Alexander Troussov
  • 67. Community detection … but What is a Community? Are you Russian? Yes. Are you Irish? Yes. Are you mathematician? Yes. Are you practitioner? Yes. – Communities easily overlap, multiple membership and fuzzy belongings At the same time, some communities SHOULD be kept separate – Remember “Strange Case of Dr Jekyll and Mr Hyde” (Robert Louis Stevenson, 1886). • How Google had failed to understand an essential property of real-world social networks • So by testing their social service inside a single context (Google employees only), the developers failed to notice that in real life, people participate in multiple contexts (family, work, friends, etc) that they work actively to keep separate. The reasons for wanting to keep these groups separate can range from wanting to keep an illicit affair secret from your spouse to political activists in oppressive regimes wanting to keep certain connections secret from the government. Another important reason to keep our communities separate, is that we often play different roles - and communicate differently http://www.iq.harvard.edu/blog/netgov/2010/03/worlds_colliding.html 67 © 2011 Alexander Troussov
  • 68. New methods for community detection are needed Multiple membership – Are you Russian? Yes. Are you Irish? Yes. Are you mathematician? Yes. Are you practitioner? Yes. … Fuzzy-belongings – We don’t know the social structures behind on-line “communities” members of an on-line community don’t necessarily have the sense of identity as members of real-life social communities, on-line communities could be project teams or networks of knowledge, … High performance and scalability (agglomerative, local, …) – Clustering as simply partitioning is ruled out because of multimembership – Clustering as partitioning is not possible in real time for many business applications • IBM Intranet: 400K employee, 10K on-line communities (the biggest 23K members), ... Contextualisation of Community Detection – Collaborative filtering systems provide recommendations based on the detection of like- minded users. But the user of a techno-social system whom the prediction is for could be "Matematician", "Irish" etc., or a kind of Dr. Jeckyll / Mr. Hyde persons, etc.(see next 68 slide) © 2011 Alexander Troussov
  • 69. An example of clustering around a node using propagation 69 © 2011 Alexander Troussov
  • 70. 70 © 2011 Alexander Troussov
  • 71. Future work in local dynamic clustering Troussov et al “Vectorised Spreading Activation” 2010 theorize that the future development of spreading activation (SA) methods might be driven by “physics-inspired” and “logic-inspired” algorithms – SA algorithms have roots in numerical simulation of various physics phenomena, particularly by finite difference methods. – From the other hand, the iterative procedure of SA is essentially the same as the procedure that determines the new state of a cell in cellular automata such as Conway’s Game of Life. Although cellular automata usually perform on rectangular (cubic, etc.) grids, the extension to arbitrary networks is feasible. ~ Marker propagation, MajorClust, Chinese whispers graph clustering algorithm, … 71 © 2011 Alexander Troussov
  • 72. Conway's Game of Life 72 © 2011 Alexander Troussov
  • 73. Conway's Game of Life 73 © 2011 Alexander Troussov
  • 74. Conway's Game of Life 74 © 2011 Alexander Troussov
  • 75. Logic-inspired VSA Finite difference approximations to differential equations were one of precursors of cellular automata (Stephen Wolfram "A New Kind of Science") and of the method of spreading activation (Troussov et al 2009) Iterative computational procedures in cellular automata are the same as in SA. The identity of the computational procedures allows to develop VSA algorithms with hybrid operations over the components of the activation vector. – For instance, “physical” operations could be responsible for the propagation of the activation around the initial seeds, the level of the activation indicates the relevancy of the nodes to the initial seeds. – “Logical” operations could propagate markers, which indicate potential belongings of nodes to clusters. Such hybrid operations will combine ranking with clustering; and is computationally efficient on massive networks since the major time consuming operations – retrieval of nodes – serve both “physical” and “logical” operations. The clustering does not involve partitioning of the whole network. 75 © 2011 Alexander Troussov
  • 76. VSA & Marker propagation – combining ranking with clustering My University An Expert A topic I’m interested in 76 © 2011 Alexander Troussov
  • 77. VSA & Clustering (Cont.) 77 © 2011 Alexander Troussov
  • 78. VSA & Clustering (Cont.) 78 © 2011 Alexander Troussov
  • 79. VSA & Clustering (Cont.) 79 © 2011 Alexander Troussov
  • 80. My University An Expert A topic I’m interested in 80 © 2011 Alexander Troussov
  • 81. Tasks / Methods Various terminology in various domains (for instance, from the point of view of IM many tasks falls into the category of hidden knowledge discovery) Multidimensional network Techno-Social Systems Networks Theory and Graph point of view (A.T.): tasks Theory terminology Recommender Systems Random walks Centralisation PageRank etc Eigenvector centrality Expertise location Recommender systems Motifs Local topology Link prediction Ad hoc generalisation across Expertise location Clustering dimensions Recommender Systems © 2011 Alexander Troussov
  • 82. Tasks Avenues to deep socio-semantic analytics and the possibility of high- quality functionalities for techno-social systems (like recommending people to invite into your social network) hinge on the availability of engines which are able – to provide hidden knowledge discovery like • Structural importance of nodes • discovering a new relation in a network that based on the strength of multiple connectivity between the nodes of a social network one can conclude that Dr. Jekyll is related to Mr. Hide), • provide ad hoc generalisation across dimensions. • For instance, the ability to detect that a particular person might serve as an representative of a community or as an expert on a particular topic (the example of such generalisation is the expression frequently attributed to Louis XIV "L'e'tat s'est moi (I'm the State).") 82 © 2011 Alexander Troussov
  • 83. “Three steps away” ? John B. Axel P. Dan B. Tim B. Why recommender decided that this three steps away connection is a strong connection? 83 83 © 2011 Alexander Troussov
  • 84. John and Tim – Recommender computes that this is a strong connection because of multiple ways of connections Shortest Path vs. Volume of traffic Friends-of-Friends Interest Workplace 84 84 © 2011 Alexander Troussov
  • 85. John and Derek Recommender computes that such type of connectivity is a weak connection 85 85 © 2011 Alexander Troussov
  • 86. Tasks: Generalisation Across Domains - Whom is Claudia connected with? All of these people Dirk Martin Claudia Elaine Researcher John Hanna 86 © 2011 Alexander Troussov
  • 87. Ranking 2 1 3 87 © 2011 Alexander Troussov
  • 88. Ranking 3 1 2 88 © 2011 Alexander Troussov
  • 89. Ranking 1 2 … … 89 © 2011 Alexander Troussov
  • 90. Nepomuk Recommender NEPOMUK (Networked Environment for Personalized, Ontology-based Management of Unified Knowledge) is an open-source software specification that is concerned with the development of a social semantic desktop that enriches and interconnects data from different desktop applications using semantic metadata stored as RDF. Initially, it was developed in the EU 6th framework integrated project Nepomuk (2006-2008) - 17 million Euros, of which 11.5 million was funded by the European Union 90 © 2011 Alexander Troussov
  • 91. Nepomuk Recommender (Cont.) Troussov et al “Social Context as Machine Processable Knowledge” presented the architecture of the hybrid recommender system in the activity centric environment Nepomuk- Simple (EU 6th Framework Project NEPOMUK). “Real” desktops usually have piles of things on them where the users (consciously or unconsciously) grouped together items which are related to each other or to a task. The so called “Pile” UI, used in the Nepomuk-Simple imitates this type of data and metadata organisation which helps to avoid premature categorisation and reduces the retention of useless documents. Metadata describing the user data are stored in the Nepomuk personal information management ontology (PIMO). Proper recommendations, such as recommendation of additional items to add to the pile, apparently should be based on the PIMO, on the textual content of the items in the pile. Although methods of natural language processing for information retrieval could be useful, the most important type of textual processing are those which allows to related concepts in PIMO to the processed texts. Since PIMO changes over the time, this type of natural language processing can’t be performed as preprocessing of all textual context related to the user. Hybrid recommendation needs on-the fly textual processing with the ability to aggregate the current instantiation of PIMO with the results of textual processing. 91 © 2011 Alexander Troussov
  • 92. Nepomuk Representing and modeling this ontology as a multidimensional network allows to augment the ontology on the fly by new information, such as the “semantic” content of the textual information in user documents. Recommendations in the Nepomuk-Simple are computed on the fly by graph-based methods performing in the unified multidimensional network of concepts from the personal information management ontology augmented with concepts extracted from the documents pertaining to the activity in question. Troussov et al. 2008 classify Nepomuk-Simple recommendations into two major types. – The first type of recommendations is recommendation of the additional items to the pile, when the user is working on an activity. – The second type of recommendations arises, for instance, when the user is browsing Web; the Nepomuk-Simple can recommend that current resource might be relevant to one or more activities performed by the user. In both cases there is a need to operate with Clouds (fuzzy sets of PIMO nodes): Clouds describe topicality of documents in terms of PIMO, the pile itself is a Cloud. 92 © 2011 Alexander Troussov
  • 93. Pile UI 93 © 2011 Alexander Troussov
  • 94. Nepomuk use case: activity management A user started to work on a new project CID. Using the Nepomuk SSD, she collects a “pile” of resources she needs while working on the project: MS-Word documents, contacts, etc by drag-and-dropping resources from her desktop, by linking resources from e-mail (Mozilla Thunderbird) and web browser (Firefox) applications. 94 © 2011 Alexander Troussov
  • 95. Nepomuk use case: activity management using IBM recommender codenamed “Galaxy” Galaxy (IBM hybrid recommender) analyses the pile content and linkage structure as a multidimensional network of concepts extracted from documents and links between concepts, projects, project participants, meetings, document authors, … . and provides handy recommendations of resources she might possibly need 95 © 2011 Alexander Troussov
  • 96. Nepomuk use case: activity management Galaxy can spot what the user might miss: “This web page might be relevant to your CID activity” 96 © 2011 Alexander Troussov
  • 97. Thank you ! © 2011 Alexander Troussov