The document discusses techniques for analyzing networks and diffusion processes on networks. It describes breadth-first search to find distances between nodes, connected components to identify distinct subgraphs, triangle counting to measure clustering, and methods to model diffusion by identifying influencer relationships between nodes based on time-stamped adoption data.
8. # initialize distances
dist = {}
for u in G:
dist[u] = NA
dist [u0] = 0
d=0
periphery = { u0 }
while len(periphery) > 0:
# find nodes one step away from the periphery
next_level = {}
for u in periphery:
next_level += { w for w in neighbors[u] if dist[w] == NA }
# update distances
d += 1
for u in next_level:
dist[u] = d
# update periphery
periphery = next_level
9. BFS @ scale
undirected network
Input
edge list, starting node u0
Output
Distance to all nodes from u0
10. BFS @ scale
undirected network
Input: edge list, distances (u, d)
1. join distances with edge list
2. foreach (u, d, w) output (w, d+1)
[ also output (u0, 0) ]
3. group by w, and output min d
11. Connected Components
undirected network
Input
Edge list
Output
List of nodes for each component
14. Connected Component
A connected subset of nodes that is not
contained in any larger connected subset
15. Connected Components
undirected network
1. Select a node u0 that has not yet been assigned
2. BFS starting from u0
3. Record nodes reached by BFS
16. Consider the global human social network,
with an edge between every pair of friends
Is this network connected?
17. Consider the global human social network,
with an edge between every pair of friends
Is this network connected?
No – there are people with no (living) friends, who
are hence isolated from the rest of the network
18. Consider the global human social network,
with an edge between every pair of friends
Is there a “giant” connected component?
19. Consider the global human social network,
with an edge between every pair of friends
Is there more than one “giant” component?
20. Consider the global human social network,
with an edge between every pair of friends
Is there more than one “giant” component?
No – unlikely to have two
large disconnected sets of people
21. Consider the global human social network,
with an edge between every pair of friends
Is there more than one “giant” component?
No – unlikely to have two
large disconnected sets of people
Historically it was more likely
e.g., pre-Columbian America & Eurasia
22. Consider the global human social network,
with an edge between every pair of friends
On average, how far are people from one another?
23. The Small-world Experiment
Stanley Milgram, 1967
296 people were randomly selected in Omaha and Wichita
Packages sent to the selected individuals with instructions to
forward to a particular stock broker in Boston through a chain of
people they knew on a first-name basis.
24. The Small-world Experiment
Stanley Milgram, 1967
Of the 296 packages, 232 did not reach target
Of the 64 that did arrive, average path length was 6
“Six degrees of separation”
27. The Anatomy of the Facebook Social Graph
J. Ugander, B. Karrer, L. Backstrom, C. Marlow
721 million users, 69 billion edges
5 degrees of separation
28. Edge list degree distribution
undirected network
Input
Edge list
Output
Degree distribution
29. 3
1 2
5
4
7
6
Degree of node u
# of edges incident on u
30. Edge list degree distribution
undirected network
Map
input: (u, w)
output: (u, w), key := u
output: (w, u), key := w
Reduce
input: u, {w1, …, wk}
output: u, k
31. Edge list degree distribution
undirected network
Map
input: u, k
identity, key := k
Reduce
input: k, {u1, …, um}
output: k, m
32. An email network of 130M users
Edges indicate reciprocated communication
33. An email network of 130M users
Edges indicate reciprocated communication
(log-log plot)
37. Counting Triangles
undirected network
Input
adjacency list
Output
Number of triangles incident on each node
38. Counting Triangles
In memory
for u in nodes:
triangles[u] = 0
for w in neighbors[u]:
triangles[u] += len(neighbors[w] & neighbors[u])
triangles[u] = triangles[u] / 2
39. Counting Triangles
@ scale
Every node needs to know to which nodes it is connected
and
to which nodes its neighbors are connected
40. Counting Triangles
@ scale
Map
input: u {w1, …, wk}
foreach wi:
output wi u {w1, …, wk}
Reduce
In memory triangle count
41. Homophily
the tendency of individuals to associate with similar others
“birds of a feather flock together”
42. Birds of a Feather: Homophily in Social Networks
McPherson, Smith-Lovin, Cook
race, sex, age, religion, education,
occupation, social class, behaviors,
attitudes, abilities, aspirations
45. Computing Homophily
Input
Edge list, race of each individual
Output
Distribution of race among friends
White Black Latino Asian
White
Black
Latino
Asian
46. Computing Homophily
1. Join edges (u, v) by u, demographics (w, race) by w
2. Join edges (u, v, urace) by v, demographics (w, race) by w
47. Computing Homophily
1. Join edges (u, v) by u, demographics (w, race) by w
2. Join edges (u, v, urace) by v, demographics (w, race) by w
3. Group edges (u, v, urace, vrace) by sorted([urace, vrace])
48. Computing Homophily
1. Join edges (u, v) by u, demographics (w, race) by w
2. Join edges (u, v, urace) by v, demographics (w, race) by w
3. Group edges (u, v, urace, vrace) by sorted([urace, vrace])
4. Count edges in each group
5. Normalize the table
49. How do ideas and products
spread through society?
55. Computing the structure of diffusion
Compute the connected components for each
forest corresponding to a URL
56. Computing the structure of diffusion
Definition. Two (rooted) trees are isomorphic if they are
identical under a relabeling of the vertices.
57. x (x) (x, x) ((x)) (x, (x))
Basis. The canonical name c(T) for the one-node tree T is x.
Induction. If T has more than one node, let T1, . . . ,Tk denote the
subtrees of the root indexed such that c(T1) ≤ c(T2) ≤ · · · ≤ c(Tk)
under the lexicographic order. Then the canonical name for T is
(c(T1), . . . ,c(Tk)).
Aho et al. [1974]
58. Computing the structure of diffusion
Compute the canonical name for each
tree in the URL forests
62. Computing the structure of diffusion
1. Join adoptions (link, u, uts) by u, edges (u, w) by u
2. Join (link, u, uts, w) by (link, w),
adoptions (link, w, wts) by (link, w)
63. Computing the structure of diffusion
1. Join adoptions (link, u, uts) by u, edges (u, w) by u
2. Join (link, u, uts, w) by
(link, w), adoptions (link, w, wts) by
(link, w)
3. Group (link, u, uts, w, wts) by (link, u)
64. Computing the structure of diffusion
1. Join adoptions (link, u, uts) by u, edges (u, w) by u
2. Join (link, u, uts, w) by (link, w),
adoptions (link, w, wts) by (link, w)
3. Group (link, u, uts, w, wts) by (link, u)
4. Output unique “parent” edge (link, u, uts, w, wts) for each group