Visualization of evolutionary cascades of messages using force-directed graphs
1. Visualization of evolutionary
cascades of messages using
force-directed graphs
Artjom Kurapov
Supervisor: Helena Kruus
Master’s thesis defense, 9 may 2011
2. Agenda
Background
Practical work
Pling.ee,opensource Gephi
Web-tool demo and twitter
3. Background
Types of networks
Properties / areas of application
Research interest
5. Goals
Visualize social networks (preferably in Estonia)
Compare friends and messages topology
Try to mine data visually using cascades
A
C
B D
So, first a little introduction in the field,then some large dataset research I’ve done,Then personally made browser tool. A small demo, features and issues faced.And a small twitter dataset results
Networks are everywhere. Most of us here study technological and information networks. But there are also biochemical, ecological and most interestingly – social networks which influence our daily life. These include sexual connections, friendship networks, citations or any kind of social behavior associated with it. In fact if you go strict about it, then citation is not really social behavior, since its directed and doesn’t imply talking to the real person. So its more like network of document dependencies. So it is important how you define connection and objects.Networks have different properties, some of which I list in the paper. And of course some of them are relevant only in one field, like bipartite graphs are only needed if you want to visualize them. Or cliques if you want to use clique analysis done.There are also different research interests. Like drawing, or how networks evolve, or how do they break apart, or where does traffic goes through, or how do can we do all kind of graph puzzles. Like graph search, coloring or solve travelling salesman problems.
So to visualize such network and its processes, one needs to see surroundings in this field – like sociology with its laws of diffusion and prefferential attachment, likenetwork properties, drawing algorithms and its complexity, and ofcourse work that has been done before – both theoretical and practical as existing software.
As a thesis goal, I suggest mining data through frequency analysis of messages and making a network topology map. That means that we want a graph representation of a network,We want both friendships and messages datasets,And then we want to see how they correlate and lead to higher forms of messages – cascades.And my hypothesis is that cascades are parts of social thought. Thus evolutionary cascades are linked cascades across multiple topics.
So I have studied Estonian social network pling.ee which belongs to Elisa Eesti AS and has 75 thousands users on the left as friendship network and 12 thousand on the right as message network. As you can see its different, and assortative mixing is present. This means that we have red nodes is here are russian and blue are estonian users. This was read from the messages and symbols they used.
So the numbers differ as well.. As you can see since it was a small portion of messages, the network is rather young and has bigger diameter. A the same time average degree is smaller which is natural, since people don’t talk to all of their friends. And clustering coefficient is also smaller, which is partially dependent on that degree tendency.
The bad news for me was that I was not able to find a single cascade. Possibly because only around 14% were sent from the browser and there were no explicit resharing function in the interface. But comparing it to twitter – people there invented RT themselves. Most likely it’s the topic of discussion that didn’t stimulate sharing, since 89% of talks were private and almost all are teenagers discussing their love life.
So to study cascades and make visualization, I’ve tried building own tool that is written in javascript and can draw small datasets along with its analysis.I’ve also done two dataset extractions from twitter.Its browser based, can do navigation.
From 12 thousand messages, around 7% can be considered as a direct cascade. But there may be more, since I didn’t take into account normal posts with directed form, that can also lead to smaller forms of cascades.On the graph you can see how depth of the retweet depends on its number in the dataset.(demo here)
I don’t talk about evolutionary network, because I study static snapshots here, but in general network does evolve from disconnected components into GCC. But it depends on a network. For example buyers in electronic shops, even though they may suggest products, don’t always lead to new customers with connection. So customers are not connected to anyone. On the other hand, there may be certain clusters in case there is some sort of affiliate network campaign.P – polynomial complexityT(n) = O (n^k)NP – nondeterministic polynomial complexity. Nondeterministic automata can have multiple decision paths from a single state.“NP complete” problems don’t have a polynomial time algorithm.“NP hard” are at least as hard as NP-complete.2. Yes, in social networks GCC diameter is maximal at first stages of network evolution, and decreases over time. I’m not so sure about other network types. Because social networks do get denser.. Since each new node can connect to 0,1 or all nodes, alpha is So in lowest case they grow linearly with exponent equal to 1, meaning like a tree.. In other case they can grow quadratically, with exponent equal to 2, they each new node basically connects to all other nodes. So the more people join in, the more friends can know the other end of the graph. Thus – smaller diameter.If you think of technological networks, then I don’t think making a wiring from japan to brasil is so easy.3. Markov centrality is one of the ways one can find most influential nodes in the network. Although its very complex to compute, my work also lists others centrality measures. And I think that4. Cascade analysis and data mining is still hand work.5. I used Fruchterman-Reingold and Yifan Hu algorithms for local forces and for adaptive cooling. I’ve added my own version of recursive force summing and presented it in the work.