Computational Social Science, Lecture 06: Networks, Part II

Networks
Part II

Sharad Goel
Columbia University
Computational Social Science: Lecture 6

March 1, 2013

Corporate E-mail Communication
[ Adamic & Adar, 2004 ]
via Easley & Kleinberg

Networks/Graphs

Nodes/vertices
people, organizations, webpages, computers

Edges
represent connections between pairs of nodes

Distance
Length of the shortest path between two nodes

Breadth-first Search
iteratively explore nodes one layer at a time

# initialize distances
dist = {}
for u in G:
dist[u] = NA

dist [u0] = 0

d=0
periphery = { u0 }
while len(periphery) > 0:
# find nodes one step away from the periphery
next_level = {}
for u in periphery:
next_level += { w for w in neighbors[u] if dist[w] == NA }

# update distances
d += 1
for u in next_level:
dist[u] = d

# update periphery
periphery = next_level

BFS @ scale
undirected network

Input
edge list, starting node u0

Output
Distance to all nodes from u0

BFS @ scale
undirected network

Input: edge list, distances (u, d)
1. join distances with edge list
2. foreach (u, d, w) output (w, d+1)
[ also output (u0, 0) ]
3. group by w, and output min d

Connected Components
undirected network

Input
Edge list

Output
List of nodes for each component

Connected Graph
There is a path between every pair of nodes

Connected Component
A connected subset of nodes that is not
contained in any larger connected subset

Connected Components
undirected network

1. Select a node u0 that has not yet been assigned
2. BFS starting from u0
3. Record nodes reached by BFS

Consider the global human social network,
with an edge between every pair of friends

Is this network connected?


Is this network connected?

No – there are people with no (living) friends, who
are hence isolated from the rest of the network


Is there a “giant” connected component?


Is there more than one “giant” component?



No – unlikely to have two
large disconnected sets of people



No – unlikely to have two
large disconnected sets of people

Historically it was more likely
e.g., pre-Columbian America & Eurasia


On average, how far are people from one another?

The Small-world Experiment
Stanley Milgram, 1967

296 people were randomly selected in Omaha and Wichita

Packages sent to the selected individuals with instructions to
forward to a particular stock broker in Boston through a chain of
people they knew on a first-name basis.

The Small-world Experiment
Stanley Milgram, 1967

Of the 296 packages, 232 did not reach target

Of the 64 that did arrive, average path length was 6

“Six degrees of separation”

Small-world phenomenon

Is “six degrees” big or small?

Small-world phenomenon

navigational vs. topological

The Anatomy of the Facebook Social Graph
J. Ugander, B. Karrer, L. Backstrom, C. Marlow

721 million users, 69 billion edges
5 degrees of separation

Edge list  degree distribution
undirected network

Input
Edge list

Output
Degree distribution

3
1 2

5

4

7
6

Degree of node u
# of edges incident on u

undirected network

Map
input: (u, w)
output: (u, w), key := u
output: (w, u), key := w

Reduce
input: u, {w1, …, wk}
output: u, k

undirected network

Map
input: u, k
identity, key := k

Reduce
input: k, {u1, …, um}
output: k, m

An email network of 130M users
Edges indicate reciprocated communication

An email network of 130M users
Edges indicate reciprocated communication
(log-log plot)

Triadic closure

1. Opportunity
2. Incentive
3. Commonality

Counting Triangles
undirected network

Input
adjacency list

Output
Number of triangles incident on each node

Counting Triangles
In memory

for u in nodes:
triangles[u] = 0
for w in neighbors[u]:
triangles[u] += len(neighbors[w] & neighbors[u])

triangles[u] = triangles[u] / 2

Counting Triangles
@ scale

Every node needs to know to which nodes it is connected
and
to which nodes its neighbors are connected

Counting Triangles
@ scale

Map
input: u {w1, …, wk}
foreach wi:
output wi u {w1, …, wk}

Reduce
In memory triangle count

Homophily
the tendency of individuals to associate with similar others

“birds of a feather flock together”

Birds of a Feather: Homophily in Social Networks
McPherson, Smith-Lovin, Cook

race, sex, age, religion, education,
occupation, social class, behaviors,
attitudes, abilities, aspirations

Homophily

1. Preference
2. Influence
3. Opportunity

Computing Homophily

Input
Edge list, race of each individual

Output
Distribution of race among friends

White Black Latino Asian
White
Black
Latino
Asian

Computing Homophily

1. Join edges (u, v) by u, demographics (w, race) by w
2. Join edges (u, v, urace) by v, demographics (w, race) by w

Computing Homophily

3. Group edges (u, v, urace, vrace) by sorted([urace, vrace])

Computing Homophily

3. Group edges (u, v, urace, vrace) by sorted([urace, vrace])
4. Count edges in each group
5. Normalize the table

How do ideas and products
spread through society?

The structure of diffusion

93% 5% 1% 0.3% 0.3%

Computing the structure of diffusion

Input
Twitter network
Time-stamped “adoptions” of 1B URLs

Output
Distribution of cascade structures


We assume v influenced u to adopt link t if
v is the last of u’s contacts to adopt before u

2
3

5

9


Draw a labeled edge from v to u

3

t
5


Group edges by their labels (URL)


Compute the connected components for each
forest corresponding to a URL


Definition. Two (rooted) trees are isomorphic if they are
identical under a relabeling of the vertices.

x (x) (x, x) ((x)) (x, (x))

Basis. The canonical name c(T) for the one-node tree T is x.

Induction. If T has more than one node, let T1, . . . ,Tk denote the
subtrees of the root indexed such that c(T1) ≤ c(T2) ≤ · · · ≤ c(Tk)
under the lexicographic order. Then the canonical name for T is
(c(T1), . . . ,c(Tk)).
Aho et al. [1974]


Compute the canonical name for each
tree in the URL forests


Count the number of trees of each type


1. Join adoptions (link, u, uts) by u, edges (u, w) by u
2. Join (link, u, uts, w) by (link, w),
adoptions (link, w, wts) by (link, w)


2. Join (link, u, uts, w) by
(link, w), adoptions (link, w, wts) by
(link, w)
3. Group (link, u, uts, w, wts) by (link, u)


2. Join (link, u, uts, w) by (link, w),
adoptions (link, w, wts) by (link, w)
3. Group (link, u, uts, w, wts) by (link, u)
4. Output unique “parent” edge (link, u, uts, w, wts) for each group

Computational Social Science, Lecture 06: Networks, Part II

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Computational Social Science, Lecture 06: Networks, Part II

Similar to Computational Social Science, Lecture 06: Networks, Part II (20)

More from jakehofman

More from jakehofman (17)

Recently uploaded

Recently uploaded (20)

Computational Social Science, Lecture 06: Networks, Part II