Slide deck from my presentation at NYC's #Pydata 2012 conference - http://nyc2012.pydata.org/abstracts/#gephi
Talk abstract:
Are you interested in working with social data to map out communities and connections between friends, fans and followers? In this session I'll show ways in which we use the python networkx library along with the open source gephi visualization tool to make sense of social network data. We'll take a few examples from Twitter, look at how a hashtag spreads through the network, and then analyze the connections between users posting to the hashtag. We'll be constructing graphs, running stats on them and then visualizing the output.
9. • Node network properties
– from immediate connections
indegree=3
• indegree
how many directed edges (arcs) are incident on a node
outdegree=2
• outdegree
how many directed edges (arcs) originate at a node
degree=5
• degree (in or out)
number of edges incident on a node
– from the entire graph
• centrality (betweenness, closeness)
Source: Lada Adamic (SI508-F08)
10. Example Graph Types
• Complete Graph
• Bipartite Graph
– Vertices can be divided into two disjoint sets
– Ex: students & schools
11.
12. Social Network Attributes
• Scale Free
– Degree distribution follows a power law
– Barabasi et al (‘99): mapped the topology of a portion
of the web
• Small World
– Most nodes are not neighbors, but can be reached by
small number of hops
– Watts & Strogatz (’98)
– Properties: cliques, sub networks with high clustering
coefficient, most pairs of nodes connected by at least
one short path
13. (Zachary) Karate club graph
social network of friendships
between 34 members of a karate
club at a US university in the
1970s.
Standard test network for
clustering algorithms -> during
the observation period the club
broke up into two separate clubs
over a conflict.
15. Graph Layout
• Open Ord
– Better distinguishes clusters
• Yifan Hu
• Force Atlas
• Fruchterman Reingold
– Graph as a system of mass particles
(nodes:particles, edges:springs)
21. Twitter Users with Python in their Bios
• 2 days of Twitter data (Oct 24th and 25th)
• Total: 4246 users (62k tweets)
• @mikanyan1 tweeted 795 times
27. Thank You
Gilad Lotan
Twitter: @gilgul
Github: giladlotan
Hinweis der Redaktion
Homophily
Endogenous Trend – information spread
Exogenous information spread
Hashtags have emerged as a way for people to gather around topics or events.
- Mitt romney: #gayrights, #lgbt, #jesus, #flipflop, #jobs, #economy- Newt Gingrich: #palestine, #OWS, #immigration, #abortion (he famously said – “Stop whining, take a bath and get a job!”Equal: #republican, #dems, #economics, #amnestyCo-occurence
Networkx supports
Zachary's Karate Club Graph describes the friendships between the members of a US karate club in the 1970s. The significant feature of this social network is that the club president and the instructor were involved in a dispute (some might say: a fight) over the issue of how much to charge for lessons. This split the club into two factions, one centred around the president, and the other centred around the instructor.
Betweenness – number of shortest paths from all vertices that pass through that node / positioningCloseness – how fast it will take to spread information from s to all other nodes sequentially / distance of s from all other actors in a networkEigenvector – measure of the influence of a node (page rank, connections to high scoring nodes contribute more to the score)Clustering Coefficient – measure of degree to which nodes in a graph tend to cluster together (how close to being a clique = 1)
NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and function of complex networks.NetworkX was born in May 2002. The original version was designed and written by AricHagberg, Dan Schult, and Pieter Swart in 2002 and 2003. The first public release was in April 2005.