1. Introduction to Networks
Methods & Measures
James Moody
jmoody77@soc.duke.edu
Duke Network Analysis Center
Department of Sociology
http://www.soc.duke.edu/~jmoody77/SNH/SNH.html
Or
bit.ly/2561VXN
Internet:
1)Select “Visitor network”
2)Open Browser
3)Accept terms & conditions
3. Introduction
We live in a connected world:
“To speak of social life is to speak of the association between
people – their associating in work and in play, in love and in
war, to trade or to worship, to help or to hinder. It is in the
social relations men establish that their interests find
expression and their desires become realized.”
Peter M. Blau
Exchange and Power in Social Life, 1964
4. *1934, NYTime. Moreno claims this work was covered in “all the major papers” but I can’t find any other clips…
*
Introduction
We live in a connected world:
"If we ever get to the point of charting a whole city or a whole nation, we would have … a picture
of a vast solar system of intangible structures, powerfully influencing conduct, as gravitation does
in space. Such an invisible structure underlies society and has its influence in determining the
conduct of society as a whole."
J.L. Moreno, New York Times, April 13, 1933
5. But scientists are starting to take network seriously:
“Networks”
Introduction
11. And yet, standard social science analysis methods do not take this space
into account.
“For the last thirty years, empirical social research has been
dominated by the sample survey. But as usually practiced, …, the
survey is a sociological meat grinder, tearing the individual from his
social context and guaranteeing that nobody in the study interacts
with anyone else in it.”
Allen Barton, 1968 (Quoted in Freeman 2004)
Moreover, the complexity of the relational world makes it impossible to
identify social connectivity using only our intuition.
Social Network Analysis (SNA) provides a set of tools to empirically
extend our theoretical intuition of the patterns that compose social
structure.
Introduction
12. Social network analysis is:
•a set of relational methods for systematically
understanding and identifying connections among actors.
SNA
•is motivated by a structural intuition based on ties
linking social actors
•is grounded in systematic empirical data
•draws heavily on graphic imagery
•relies on the use of mathematical and/or computational
models.
•Social Network Analysis embodies a range of theories
relating types of observable social spaces and their relation
to individual and group behavior.
Introduction
13. Introduction
Key Questions
Social Network analysis lets us answer questions about social interdependence.
These include:
“Networks as Variables” approaches
•Are kids with smoking peers more likely to smoke themselves?
•Do unpopular kids get in more trouble than popular kids?
•Do central actors control resources?
“Networks as Structures” approaches
•What generates hierarchy in social relations?
•What network patterns spread diseases most quickly?
•How do role sets evolve out of consistent relational activity?
Both: Connectionist vs. Positional features of the network
We don’t want to draw this line too sharply: emergent role positions can
affect individual outcomes in a ‘variable’ way, and variable approaches
constrain relational activity.
14. How do we best use these rapidly developing data and methods to promote health and
wellbeing?
Goals for today
1) Two mechanisms for networks & health: connections & positions
2) Basic network data structures
• How are network data different from standard data
• Volume measures
3) Measuring network properties
• Reachability
• Distance
• Redundancy
• Centrality
• Triad Distributions/hierarchy
• Structural Equivalence
4) Software sneak peak
Introduction
Connections
Positions
15. Why do networks matter?
Two fundamental mechanisms: Problem space
Connectionist:
Positional:
Networks as pipes
– networks matter
because of what
flows through
them.
Networks as roles –
networks matter
because of
relational patterns
Networks
As Cause
Networks
As Result
This rubric is organized around social mechanisms – the reasons why networks matter,
which ends up being loosely correlated with specific types of measures & analyses.
C
P
X Y
16. Why do networks matter?
Two fundamental mechanisms: Connections
Connectionist network mechanisms : Networks matter because of the
things that flow through them. Networks as pipes.
C
P
X Y
17. The spread of any epidemic depends on the number of
secondary cases per infected case, known as the
reproductive rate (R0
). R0
depends on the probability that
a contact will be infected over the duration of contact (β),
the likelihood of contact (c), and the duration of
infectiousness (D).
cDRo β=
For STI, the trick is specifying c, which depends on the network.
C
P
X Y
Why do networks matter?
Two fundamental mechanisms: Connections example
18. Isolated visionWhy do networks matter?
Two fundamental mechanisms: Connections example
C
P
X Y
19. Connected visionWhy do networks matter?Why do networks matter?
Two fundamental mechanisms: Connections example
C
P
X Y
20. Partner
Distribution
Component
Size/Shape
Emergent Connectivity in “low-degree” networks
C
P
X Y
Connections: Diffusion
Example: Small local changes can create cohesion
cascades
Based on work supported by R21-HD072810 (NICHD, Moody PI), R01 DA012831-05 (NIDA Morris, Martina PI)
21. Provides food for
Romantic Love
Bickers with
Why do networks matter?
Two fundamental mechanisms: Positions
Positional network mechanisms : Networks matter because of the way
they capture role behavior and social exchange. Networks as Roles.
C
P
X Y
22. Parent Parent
Child
Child
Child
Provides food for
Romantic Love
Bickers with
Why do networks matter?
Two fundamental mechanisms: Positions
Positional network mechanisms : Networks matter because of the way
they capture role behavior and social exchange. Networks as Roles.
C
P
X Y
23. Why do networks matter?
Two fundamental mechanisms: Problem space
Connectionist:
Positional:
Networks as pipes
Networks as roles
Networks
As Cause
Networks
As Result
Diffusion
Peer influence
Social Capital
“small worlds”
Social integration
Peer selection
Homophily
Network robustness
Popularity Effects
Role Behavior
Network Constraint
Group stability
Network ecology
“Structuration”
This rubric is organized around social mechanisms – the reasons why networks matter,
which ends up being loosely correlated with specific types of measures & analyses.
24. Social network analysis is:
•a set of relational methods for systematically understanding
and identifying connections among actors. SNA
•is motivated by a structural intuition based on ties linking
social actors
•is grounded in systematic empirical data
•draws heavily on graphic imagery
•relies on the use of mathematical and/or computational
models.
•Social Network Analysis embodies a range of theories
relating types of observable social spaces and their relation to
individual and group behavior.
Network Methods & Measures
26. The unit of interest in a network are the combined sets of
actors and their relations.
We represent actors with points and relations with lines.
Actors are referred to variously as:
Nodes, vertices or points
Relations are referred to variously as:
Edges, Arcs, Lines, Ties
Example:
a
b
c e
d
Social Network Data
27. In general, a relation can be:
Binary or Valued
Directed or Undirected
a
b
c e
d
Undirected, binary Directed, binary
a
b
c e
d
a
b
c e
d
Undirected, Valued Directed, Valued
a
b
c e
d
1 3
4
21
Social Network Data
28. In general, a relation can be: (1) Binary or Valued (2) Directed or Undirected
Social Network Data
Basic Data Elements
The social process of interest will often determine what form your data take. Conceptually, almost
all of the techniques and measures we describe can be generalized across data format, but you may
have to do some of the coding work yourself….
a
b
c e
d
Directed,
Multiplex categorical edges
29. We can examine networks across multiple levels:
1) Ego-network
- Have data on a respondent (ego) and the people they are connected to
(alters). Example: 1985 GSS module
- May include estimates of connections among alters
2) Partial network
- Ego networks plus some amount of tracing to reach contacts of
contacts
- Something less than full account of connections among all pairs of
actors in the relevant population
- Example: CDC Contact tracing data for STDs
Social Network Data
Basic Data Elements: Levels of analysis
30. 3) Complete or “Global” data
- Data on all actors within a particular (relevant) boundary
- Never exactly complete (due to missing data), but boundaries are
set
-Example: Coauthorship data among all writers in the social
sciences, friendships among all students in a classroom
We can examine networks across multiple levels:
Social Network Data
Basic Data Elements: Levels of analysis
32. A good network drawing allows viewers to come away from the image with an almost
immediate intuition about the underlying structure of the network being displayed.
However, because there are multiple ways to display the same information, and standards
for doing so are few, the information content of a network display can be quite variable.
Now trace the actual pattern of ties.
You will see that these 4 graphs are
exactly the same.
Consider the 4 graphs drawn at right.
After asking yourself what intuition
you gain from each graph, click on
the screen.
Social Network Data
Graph Layout (teaser)
33. Network visualization helps build intuition, but you have to keep the drawing
algorithm in mind. Here we show the same graphs with two different techniques:
Tree-Based layouts
Most effective for very sparse,
regular graphs. Very useful
when relations are strongly
directed, such as organization
charts or internet connections.
Spring embedder layouts
Most effective with graphs that have a strong
community structure (clustering, etc). Provides a very
clear correspondence between social distance and
plotted distance
Two images of the same network
(good) (Fair - poor)
Social Network Data
Graph Layout
35. In general, graphs are cumbersome to work with analytically, though there is a
great deal of good work to be done on using visualization to build network
intuition.
I recommend using layouts that optimize on the feature you are most interested
in. The two I use most are a hierarchical layout or a force-directed layout are
best.
We’ll go into much more detail in the visualization seminar.
Basic Data Structures
Social Network Data
36. Social Network Data
Social network data are substantively divided by the number of
modes in the data.
1-mode data represents edges based on direct contact between
actors in the network. All the nodes are of the same type (people,
organization, ideas, etc). Examples:Communication, friendship,
giving orders, sending email.
This is commonly
what people think
about when
thinking about
networks: nodes
having direct
relations with
each other.
37. Social Network Data
Social network data are substantively divided by the number of
modes in the data.
2-mode data represents nodes from two separate classes, where
all ties are across classes. Examples:
People as members of groups
People as authors on papers
Words used often by people
Events in the life history of people
The two modes of the data represent a duality: you can project
the data as people connected to people through joint membership
in a group, or groups to each other through common membership
There may be multiple relations of multiple types connecting
your nodes.
38. Bipartite networks imply a constraint on the mixing, such that ties only cross classes.
Here we see a tie connecting each woman with the party she attended (Davis data)
Social Network Data
Basic Data Elements: Modes
39. Social Network Data
Basic Data Elements: Modes
Bipartite networks imply a constraint on the mixing, such that ties only cross classes.
Here we see a tie connecting each woman with the party she attended (Davis data)
40. By projecting the data, one can look at the shared between people or the common
memberships in groups: this is the person-to-person projection of the 2-mode data.
Social Network Data
Basic Data Elements: Modes
41. Social Network Data
Basic Data Elements: Modes
By projecting the data, one can look at the shared between people or the common
memberships in groups: this is the group-to-group projection of the 2-mode data.
42. Social Network Data
Example of a 2-mode
network: faculty
supervising students
- Any list of what
people do –
meetings, clubs,
activities, co-
authorship, – that
they do with others
forms network data.
Moody
43. The Movement of Carbapenem-Resistant Klebsiella pneumoniae among Healthcare Facilities: A Network Analysis
D van Duin, F Perez, E Cober, SS Richter, RC Kalayjian, RA Salata, N Scalera, R Watkins, Y Doi, S Evans, VG Fowler Jr, KS Kaye, SD Rudin, KM Hujer, AM Hujer,
RA Bonomo, and J Moody for the Antibacterial Resistance Leadership Group
Social Network Data
Example of a 2-mode network: Patients & Care Settings
44. Casalino, Lawrence P., Michael F. Pesko, Andrew M. Ryan, David J. Nyweide, Theodore J. Iwashyna, Xuming Sun, Jayme Mendelsohn and James
Moody. “Physician Networks and Ambulatory Care Admissions” Medical Care 53:534-41
Social Network Data
Example of a 2-mode network: Patients & Care Settings
45. From pictures to matrices
a
b
c e
d
Undirected, binary
a b c d e
a
b
c
d
e
1
1 1
1 1 1
1 1
1 1
An undirected graph and the
corresponding matrix is symmetric.
Because network images are hard to work
with, we often use an adjacency matrix to
represent the network.
The matrix (X) at right represents an
undirected binary network. Each node (a-e)
is listed on both the row and the column.
The ith
row and the jth
column (Xij) records the
value of a tie from node i to node j. For
example, the line between nodes a and b is
represented as an entry in the first row and
second column (red at right).
Because the graph is undirected the ties sent
are the same as the ties receive, so every entry
above the diagonal equals the entries below
the diagonal.
Basic Data Structures
Social Network Data
46. Directed, binary
a
b
c e
d
a b c d e
a
b
c
d
e
1
1
1 1 1
1 1
A directed graph and the
corresponding matrix is asymmetrical.
Directed graphs, on the other hand,
are asymmetrical.
We can see that Xab =1 and Xba =1,
therefore a “sends” to b and b “sends” to a.
However, Xbc=0 while Xcb=1; therefore,
c “sends” to b, but b does not reciprocate.
Basic Data Structures
Social Network Data
47. a b c d e
a
b
c
d
e
1
3
1 2 4
2 1
A directed graph and the
corresponding matrix is asymmetrical.
Directed graphs, on the other hand,
are asymmetrical.
We can see that Xab =1 and Xba =1,
therefore a “sends” to b and b “sends” to a.
However, Xbc=0 while Xcb=1; therefore,
c “sends” to b, but b does not reciprocate.
Basic Data Structures
Social Network Data
Directed, Valued
a
b
c e
d
48. From matrices to lists (binary)
a b c d e
a
b
c
d
e
1
1 1
1 1 1
1 1
1 1
a b
b a c
c b d e
d c e
e c d
a b
b a
b c
c b
c d
c e
d c
d e
e c
e d
Adjacency List
Arc List
Social network analysts also use adjacency lists and arc lists
to more efficiently store network data.
a
b
c e
d
Basic Data Structures
Social Network Data
49. From matrices to lists (valued)
a b c d e
a
b
c
d
e
1
1 2
2 3 5
3 1
5 1
a b
b a c
c b d e
d c e
e c d
a b 1
b a 1
b c 2
c b 2
c d 3
c e 5
d c 3
d e 1
e c 5
e d 1
Adjacency List
Arc List
Social network analysts also use adjacency lists and arc lists
to more efficiently store network data.
a
b
c e
d
Basic Data Structures
Social Network Data
1 2
5
13 a 1
b 1 2
c 2 3 1
d 3 1
e 5 1
contact value
50. Working with two-mode data
A person-to-group adjacency matrix is rectangular, with persons down
rows and groups across columns
1 2 3 4 5
A 0 0 0 0 1
B 1 0 0 0 0
C 1 1 0 0 0
D 0 1 1 1 1
E 0 0 1 0 0
F 0 0 1 1 0
A =
Each column is a group,
each row a person, and
the cell = 1 if the person in
that row belongs to that
group.
You can tell how many
groups two people both
belong to by comparing
the rows: Identify every
place that both rows = 1,
sum them, and you have
the overlap.
Basic Data Structures
Social Network Data
51. One can get either projection easily with a little matrix multiplication.
First define AT
as the transpose of A (simply reverse the rows and
columns). If A is of size P x G, then AT
will be of size G x P.
ji
T
ij AA =
1 2 3 4 5
A 0 0 0 0 1
B 1 0 0 0 0
C 1 1 0 0 0
D 0 1 1 1 1
E 0 0 1 0 0
F 0 0 1 1 0
A =
A B C D E F
1 0 1 1 0 0 0
2 0 0 1 1 0 0
3 0 0 0 1 1 1
4 0 0 0 1 0 1
5 1 0 0 1 0 0
AT
=
Working with two-mode data
Basic Data Structures
Social Network Data
52. G = AT
(A)
1 2 3 4 5
A 0 0 0 0 1
B 1 0 0 0 0
C 1 1 0 0 0
D 0 1 1 1 1
E 0 0 1 0 0
F 0 0 1 1 0
A =
A B C D E F
1 0 1 1 0 0 0
2 0 0 1 1 0 0
3 0 0 0 1 1 1
4 0 0 0 1 0 1
5 1 0 0 1 0 0
AT
=
(6x5) (5x6)
AT
* A = P
(5x6) 6x5) (5x5)
G
1 2 3 4 5
1 2 1 0 0 0
2 1 2 1 1 1
3 0 1 3 2 1
4 0 1 2 2 1
5 0 1 1 1 2
A * AT
= P
(6x5)(5x6) (6x6)
P
A B C D E F
A 1 0 0 1 0 0
B 0 1 1 0 0 0
C 0 1 2 1 0 0
D 1 0 1 4 1 2
E 0 0 0 1 1 1
F 0 0 0 2 1 2
P = A(AT
)
Basic Data Structures
Social Network Data
53. Basic Volume Measures
Social Network Data
Density: Mean of the adjacency matrix
a b c d e
a
b
c
d
e
1
1 2
2 3 5
3 1
5 1
a
b
c e
d1 2
5
13
--
--
--
--
--
a b c d e
a
b
c
d
e
1
1 1
1 1 1
1 1
1 1
--
--
--
--
--
0 0 0
0 0
0
0
0
0
0
0 0 0
0 0
0
0
0
0
0
54. Basic Volume Measures
Social Network Data
Degree: Number of links adjacent to a node
a b c d e
a
b
c
d
e
1
1
1 1 1
1 1
1
1
3
0
2
1 2 1 2 1 7
55. Basic Volume Measures
Social Network Data
Weighted Degree: Sum of links adjacent to a node
a b c d e
a
b
c
d
e
1
1
2 3 1
5 1
1
1
6
0
6
1 3 5 4 1 14
56. For example only 3.9% of kids are in the most popular quintile all 5 waves, and a
full 50% are in the top quintile at least once over the observation period. Similarly,
only 1.9% of kids are least popular all 5 waves, but 43% are least popular at least
once.
W.T. Grant Foundation 8316 & NIDA R01DA018225 (Osgood)), The Impact of School-Based Prevention on Friendship Networks and Peer Influence
Basic Volume Measures matter
Social Network Data
57. This figure
represents the
distribution of
cases across that
space, with key
points labeled.
C
P
X Y
W.T. Grant Foundation 8316 & NIDA R01DA018225 (Osgood)), The Impact of School-Based Prevention on Friendship Networks and Peer Influence
Social Network Data
58. The regression
estimates define a
simple 2-d space
of slope (Y) and
intercept (x).
This region
represents steady
& sharp increases
in popularity, from
a low starting point
to a high ending
point. These kids
are upwardly
mobile
Dynamics of networks
Popularity Effects
C
P
X Y
W.T. Grant Foundation 8316 & NIDA R01DA018225 (Osgood)), The Impact of School-Based Prevention on Friendship Networks and Peer Influence
59. The regression
estimates define a
simple 2-d space
of slope (Y) and
intercept (x).
This region
represents steady
& sharp increases
in popularity, from
a low starting point
to a high ending
point. These kids
are upwardly
mobile
Y
Dynamics of networks
Popularity Effects
C
P
X Y
W.T. Grant Foundation 8316 & NIDA R01DA018225 (Osgood)), The Impact of School-Based Prevention on Friendship Networks and Peer Influence
60. The regression
estimates define a
simple 2-d space
of slope (Y) and
intercept (x).
This region
represents steady
& sharp increases
in popularity, from
a low starting point
to a high ending
point. These kids
are upwardly
mobile
Dynamics of networks
Popularity Effects
C
P
X Y
W.T. Grant Foundation 8316 & NIDA R01DA018225 (Osgood)), The Impact of School-Based Prevention on Friendship Networks and Peer Influence
61. At slope=0, we
see the main
effect of
popularity as
increasing
along the X
axis:
There is a
strong an
steady increase
in use as
popularity goes
up.
C
P
X Y
Dynamics of networks
Popularity Effects
W.T. Grant Foundation 8316 & NIDA R01DA018225 (Osgood)), The Impact of School-Based Prevention on Friendship Networks and Peer Influence
62. When slope is
positive, kids
are becoming
more popular,
and we see an
increase in
use…
C
P
X Y
Dynamics of networks
Popularity Effects
W.T. Grant Foundation 8316 & NIDA R01DA018225 (Osgood)), The Impact of School-Based Prevention on Friendship Networks and Peer Influence
63. But even
stronger use
among those
with decreasing
substance use.
C
P
X Y
Dynamics of networks
Popularity Effects
W.T. Grant Foundation 8316 & NIDA R01DA018225 (Osgood)), The Impact of School-Based Prevention on Friendship Networks and Peer Influence
64. Mental health and network size
Network size matters for depressive symptoms, but not linearly
Falci, Christina, and Clea McNeely. 2013. “Too Many Friends: Social Integration, Network Cohesion, and Adolescent Depressive Symptoms.” Social Forces
87(4):2031–61.
65. Health benefits of network growth
The risk of functional impairment among older adults decreases as they have more confidants in their
personal networks
Predicted probabilities of different degrees of functional impairment
among older adults, by the number of confidants added
Cornwell, Benjamin, and Edward O. Laumann. 2015. "The health benefits of network growth: New evidence from a national survey of older adults." Social Science &
Medicine 125: 94-106.
67. Two features of the network’s topology are key to connectionist aspects of
networks: connectivity and centrality
Connectivity refers to how actors in one part of the network are connected to
actors in another part of the network.
• Reachability: Is it possible for actor i to reach actor j? This can only be
true if there is a chain of contact from one actor to another.
• Distance: Given they can be reached, how many steps are they from
each other?
• Redundancy: How many different paths connect each pair?
Measuring Networks: Connectionist properties
68. These features combine in important ways to define characteristic features of
networks :
Measuring Networks: Connectionist properties
Degree x Reachability
Emergent Connectivity
Distance x (Local) Redundancy
Small Worlds
Reachability x redundancy
Network Resilience
69. d e
c
Indirect connections are what make networks systems. One actor can
reach another if there is a path in the graph connecting them.
a
b
c e
d
f
b f
a
Paths can be directed, leading to a distinction between strong and weak
components
Measuring Networks: Connectivity
Reachability
70. Basic elements in connectivity
•A path is a sequence of nodes and edges starting with one node and
ending with another, tracing the indirect connection between the two.
On a path, you never go backwards or revisit the same node twice.
Example: a b cd
•A walk is any sequence of nodes and edges, and may go backwards.
Example: a b c b c d
•A cycle is a path that starts and ends with the same node. Example: a
b c a
Measuring Networks: Connectivity
Reachability
71. Reachability
If you can trace a sequence of relations from one actor to another,
then the two are reachable. If there is at least one path connecting
every pair of actors in the graph, the graph is connected and is called
a component.
Intuitively, a component is the set of people who are all connected by
a chain of relations.
Measuring Networks: Connectivity
Reachability
73. Because relations can be directed or undirected, components come in two flavors:
For a graph with any directed edges, there are two types of components:
Strong components consist of the set(s) of all nodes that are mutually
reachable
Weak components consist of the set(s) of all nodes where at least one node
can reach the other.
Measuring Networks: Connectivity
Reachability
74. There are only 2 strong
components with more
than 1 person in this
network.
Components are the
minimum requirement for
social groups. As we will
see later, they are
necessary but not
sufficient
All of the major network analysis
software identifies strong and weak
components
Measuring Networks: Connectivity
Reachability
75. Many large networks are characterized by a highly skewed distribution of the
number of partners (degree)
Measuring Networks: Connectivity
Reachability x Degree Distribution Scale Free Networks
76. Many large networks are characterized by a highly skewed distribution of the
number of partners (degree)
λ−
kkp ~)(
Measuring Networks: Connectivity
Reachability x Degree Distribution Scale Free Networks
77. The scale-free model focuses on the distance-reducing
capacity of high-degree nodes:
Measuring Networks: Connectivity
Reachability x Degree Distribution Scale Free Networks
78. Measuring Networks: Connectivity
Reachability x Degree Distribution Scale Free Networks
The scale-free model focuses on the distance-reducing
capacity of high-degree nodes:
If a preferential
attachment model is
active, then high-degree
nodes are hubs, that if
removed can disconnect
the network.
Since hubs are rare, PA
networks are robust to
random attack, but very
fragile to targeted
attack.
79. Colorado Springs High-Risk
(Sexual contact only) •Network is approximately
scale-free, with λ = -1.3
•But connectivity does not
depend on the hubs.
•PrefAttach Scale-free
distribution, but scale-free
distribution ^
preferential attachment.
Scale Free Networks
Measuring Networks: Large-Scale Models
80. a
Geodesic distance is measured by the smallest (weighted) number of relations separating a pair:
Actor “a” is:
1 step from 4
2 steps from 5
3 steps from 4
4 steps from 3
5 steps from 1
a
Measuring Networks: Connectivity
Distance
81. a b c d e f g h i j k l m
------------------------------------------
a. . 1 2 . . . . . . . . 2 1
b. 3 . 1 . . . . . . . . 1 2
c. . . . . . . . . . . . . .
d. 4 3 1 . 1 2 1 . 2 . . 2 3
e. 3 2 2 1 . 1 2 . 1 . . 1 2
f. 4 3 3 2 1 . 3 . 2 . . 2 3
g. 5 4 4 3 2 1 . . 3 . . 3 4
h. . . . . . . . . 1 . . . .
i. . . . . . . . . . . . . .
j. . . . . . . . . 1 . . . .
k. . . . . . . . . 1 . . . .
l. 2 1 2 . . . . . . . . . 1
m. 1 2 3 . . . . . . . . 1 .
b
c
d
g
f
e
k
i j
h
l
m
a
When the graph is directed,
distance is also directed
(distance to vs distance from),
following the direction of the tie.
Measuring Networks: Connectivity
Distance
82. Reachability in Colorado Springs
(Sexual contact only)
(Node size = log of degree)
•High-risk actors over 4 years
•695 people represented
•Longest path is 17 steps
•Average distance is about 5 steps
•Average person is within 3 steps of
75 other people
Measuring Networks: Connectivity
Distance
84. Calculating distance in global networks: Breadth-First Search
In large networks, matrix multiplication is just too slow. A breadth-
first search algorithm works by walking through the graph, reaching
all nodes from a particular start node.
Distance is calculated directly in most SNA software packages.
Measuring Networks: Connectivity
Distance
85. As a graph statistic, the distribution of distance can tell you a good deal about
how close people are to each other (we’ll see this more fully when we get to
closeness centrality).
The diameter of a graph is the longest geodesic, giving the maximum distance.
We often use the l, or the mean distance between every pair to characterize the
entire graph.
For example, all else equal, we would expect rumors to travel faster through
settings where the average distance is small.
Measuring Networks: Connectivity
Distance
86. For a real network, people’s friends are not random, but clustered. We can
modify the random equation by adjusting a, such that some portion of the
contacts are random, the rest not. This adjustment is a ‘bias’ - I.e. a non-random
element in the model -- that gives rise to the notion of ‘biased networks’. People
have studied (mathematically) biases associated with:
•Race (and categorical homophily more generally)
•Transitivity (Friends of friends are friends)
•Reciprocity (i--> j, j--> i)
There is still a great deal of work to be done in this area empirically, and it
promises to be a good way of studying the structure of very large networks.
Measuring Networks: Connectivity
Redundancy Local
87. Measuring Networks: Connectivity
Redundancy Local
Local redundancy is
known as “clustering”
or “transitivity” - that
one’s friends are friends
with each other.
Density is the
proportion of pairs tied,
excluding ego.
Transitivity is the
proportion of two-step
ties that are closed
(Friend of a Friend is a
friend)
Density Transitivity
Transitivity
No ego
0 0 0
0.4 0.71 1.0
1 1 1
0.7 0.78 .64
88. Node Connectivity
As size of cut-set
0 1 2 3
Structural Cohesion:
A network’s structural cohesion is equal to the minimum number of
actors who, if removed from the network, would disconnect it.
C
P
X Y
Measuring Networks: Connectivity
Redundancy Global: Structural Cohesion
89. 0 1 2 3
Node Connectivity
As number of node-independent paths
C
P
X Y
Measuring Networks: Connectivity
Redundancy Global: Structural Cohesion
Structural Cohesion:
A network’s structural cohesion is equal to the minimum number of
actors who, if removed from the network, would disconnect it.
90. 1
2
3
4
Nestedness Structure
Cohesive Blocks Depth
Sociogram
5
Cohesive Blocking
The arrangement of subsequently more connected sets by
branches and depth uniquely characterize the connectivity
structure of a network
C
P
X Y
Measuring Networks: Connectivity
Redundancy Global: Structural Cohesion
91. Distance & Connectivity measures “locate” a node based on particular features
of the path structure, but there are many other ways of locating nodes in
networks.
Centrality refers to (one dimension of) location, identifying where an actor
resides in a network.
• For example, we can compare actors at the edge of the network to actors
at the center.
• In general, this is a way to formalize intuitive notions about the
distinction between insiders and outsiders.
As a terminology point, some authors distinguish centrality from prestige based on the
directionality of the tie. Since the formulas are the same in every other respect, I stick with
“centrality” for simplicity.
Measuring Networks
Centrality
92. Conceptually, centrality is fairly straight forward: we want to identify
which nodes are in the ‘center’ of the network. In practice, identifying
exactly what we mean by ‘center’ is somewhat complicated, but
substantively we often have reason to believe that people at the center are
very important.
The standard centrality measures capture a wide range of “importance” in
a network:
•Degree
•Closeness
•Betweenness
•Eigenvector / Power measures
After discussing these, I will describe measures that combine features of
each of them.
Measuring Networks
Centrality
93. The most intuitive notion of centrality focuses on degree. Degree is
the number of direct contacts a person has. The ideas is that the actor
with the most ties is the most important:
∑=== +
j
ijiiD XXndC )(
Measuring Networks
Centrality
95. If we want to measure the degree to which the graph as a whole is centralized,
we look at the dispersion of centrality:
Simple: variance of the individual centrality scores.
gCnCS
g
i
diDD /))((
1
22
−= ∑=
Or, using Freeman’s general formula for centralization (which ranges from 0 to 1):
[ ]
)]2)(1[(
)()(1
*
−−
−
=
∑=
gg
nCnC
C
g
i iDD
D
UCINET, SPAN, PAJEK and most other network software will calculate these measures.
Measuring Networks
Centrality
97. A second measure of centrality is closeness centrality. An actor is considered
important if he/she is relatively close to all other actors.
Closeness is based on the inverse of the distance of each actor to every other
actor in the network.
1
1
),()(
−
=
= ∑
g
j
jiic nndnC
)1))((()('
−= gnCnC iCiC
Closeness Centrality:
Normalized Closeness Centrality
Measuring Networks
Centrality
101. Betweenness Centrality:
Model based on communication flow: A person who lies on
communication paths can control communication flow, and is thus important.
Betweenness centrality counts the number of shortest paths between i and k
that actor j resides on.
b
a
C d e f g h
Measuring Networks
Centrality
102. ∑<
=
kj
jkijkiB gngnC /)()(
Betweenness Centrality:
Where gjk = the number of geodesics connecting jk, and
gjk(ni) = the number that actor i is on.
Usually normalized by:
]2/)2)(1/[()()('
−−= ggnCnC iBiB
Measuring Networks
Centrality
105. Information Centrality:
It is quite likely that information can flow through paths other than the geodesic. The
Information Centrality score uses all paths in the network, and weights them based on their length.
Measuring Networks
Centrality
106. Comparing across these 3 centrality values
•Generally, the 3 centrality types will be positively correlated
•When they are not (low) correlated, it probably tells you something interesting about the network.
Low
Degree
Low
Closeness
Low
Betweenness
High Degree Embedded in cluster
that is far from the
rest of the network
Ego's connections are
redundant -
communication
bypasses him/her
High Closeness Key player tied to
important
important/active alters
Probably multiple
paths in the network,
ego is near many
people, but so are
many others
High Betweenness Ego's few ties are
crucial for network
flow
Very rare cell. Would
mean that ego
monopolizes the ties
from a small number
of people to many
others.
Measuring Networks
Centrality
107. Bonacich Power Centrality: Actor’s centrality (prestige) is equal to a function of
the prestige of those they are connected to. Thus, actors who are tied to very
central actors should have higher prestige/ centrality than those who are not.
1)(),( 1
RRIC −
−= βαβα
• α is a scaling vector, which is set to normalize the score.
• β reflects the extent to which you weight the centrality of people ego is
tied to.
•R is the adjacency matrix (can be valued)
•I is the identity matrix (1s down the diagonal)
•1 is a matrix of all ones.
Measuring Networks
Centrality
108. Bonacich Power Centrality:
The magnitude of β reflects the radius of power. Small values of β weight
local structure, larger values weight global structure.
If β is positive, then ego has higher centrality when tied to people who are
central.
If β is negative, then ego has higher centrality when tied to people who are
not central.
As β approaches zero, you get degree centrality.
Measuring Networks
Centrality
111. An Example of the triad census
Type Number of triads
---------------------------------------
1 - 003 21
---------------------------------------
2 - 012 26
3 - 102 11
4 - 021D 1
5 - 021U 5
6 - 021C 3
7 - 111D 2
8 - 111U 5
9 - 030T 3
10 - 030C 1
11 - 201 1
12 - 120D 1
13 - 120U 1
14 - 120C 1
15 - 210 1
16 - 300 1
---------------------------------------
Sum (2 - 16): 63
Measuring Networks
Triads
112. As with undirected graphs, you can use the type of triads allowed
to characterize the total graph. But now the potential patterns are
much more diverse
1) All triads are 030T:
A perfect linear hierarchy.
Measuring Networks
Triads: Estimating Macro Structure
113. Cluster Structure, allows triads: {003, 300, 102}
M M
N*
M M
N*
N* N*
N*
Eugene
Johnsen (1985,
1986) specifies
a number of
structures that
result from
various triad
configurations
1
1
1
1
Measuring Networks
Triads: Estimating Macro Structure
114. PRC{300,102, 003, 120D, 120U, 030T, 021D, 021U} Ranked Cluster:
M M
N*
M M
N*
M
A*A*
A*A*
A*A*
A*A*
1
1
1
1
1
1
1
1
1
0
1
1
1
1 0
0
0
0 0 0 0
0 0
0 0
And many more...
Measuring Networks
Triads: Estimating Macro Structure
116. Parent Parent
Child
Child
Child
Provides food for
Romantic Love
Bickers with
Positional network mechanisms : Networks matter because of the way
they capture role behavior and social exchange. Networks as Roles.
Measuring Networks
Structural Equivalence
117. White et al: From logical role systems to empirical social structures
The key idea, is that we can express a role through a relation (or set of relations)
and thus a social system by the inventory of roles. If roles equate to positions in
an exchange system, then we need only identify particular aspects of a position.
But what aspect? Block modeling focuses on equivalence positions.
Structural Equivalence
Two actors are structurally
equivalent if they have the same
types of ties to the same people.
That is, they have the exact same
ties.
Measuring Networks
Structural Equivalence
118. Structurally Equivalence: same
ties to same people
Automorphic Equivalence: graph
theoretically identical – without
labels, automorphic nodes are
indistinguishable
*Formally: If actors i and j are regularly equivalent, and actor i has a tie to/from some actor, k, then actor j must have the same kind of tie
to/from some actor l, and actors k and l must be regularly equivalent. This means it’s a recursive definition and not always unique.
Regular Equivalence*: Similar
pattern of ties to similar types of
nodes
Measuring Networks
Structural Equivalence
130. Repeat the process on the resulting 1-blocks until you have reached structural equivalent
blocks
Because CONCOR splits every sub-
group into two groups, you get a partition
tree that looks something like this:
Measuring Networks
Structural Equivalence
131. PAJEK: A free program for drawing
& analyzing networks, optimized for
very large networks. Windows or Unix
(not so much mac?)
http://mrvar.fdv.uni-lj.si/pajek/
Software
Stand-alone tools
132. UCINET: A general program for analyzing networks.
Point-and-Click. Has some very specialized tools.
Software
Stand-alone tools
133. UCINET: A general program for analyzing networks.
Point-and-Click. Has some very specialized tools.
Software
Stand-alone tools
135. Software
General tools
R, RStudio
Fast becoming the
field standard,
particularly for
statistical modeling
of networks, easy to
construct measures
and add to models if
you work in R for
the rest of your
workflow.
Not the best data
management
program…
136. STATNET
•Program designed to estimate statistical models on networks in R.
Statnet Team
http://csde.washington.edu/statnet/
Other R Resources:
Carter Butts (UC-Irvine, Sociology) – SNA & PermNet
•Program for general network analysis in R
•Does most of what we’ve discussed today…
Software
General tools
R, RStudio
Moreno claimed authorship of the idea; citing no real inspiration other than his own creativity. But graph theory has been going since 1740s, kinship diagrams since at least the 1870s, org charts from the 1920s, etc. So he’s not operating in a vacuum.
Consider the following example. Here we have sampled respondents (red dots) reporting on their interaction with romantic partners. A classic local network module would ask about their characteristics and behaviors, then attempt to relate those characteristics to ego’s behavior. All of these sampled nodes have the exact same number of partners.
But these nodes are situated in dramatically different parts of the real underlying global network. Here some of them (lower left) are truly local isolates, but most are embedded in a larger network structure.
These are real data from Add Health, on romantic involvement.
Start w. simply opening CNAT.
Then look at ZOOM, Refresh, etc.
Then re-draw the network based on Degree
Then redraw from a particular ego.
&quot;ambulatory care sensitive conditions&quot; (ACSCs). ACSCs are conditions for which good outpatient care can potentially prevent the need for hospitalization, or for which early intervention can prevent complications or more severe disease. These are conditions for which good primary care may prevent admission. We used 12 conditions that met the AHRQ definition.
We examined fixed effects for each PPC, controlling for patient and physician characteristics. PPC fixed effects were jointly significant in the model at P &lt; 0.01, suggesting that PPCs are associated with ACSA rates. The differences in performance were substantial. For example, compared to a mean number of ACSAs of 0.060 per beneficiary per year for all PPCs, the PPC at the 25th percentile of ACSA rates had 0.050 ACSAs per beneficiary, whereas the PPC at the 75th percentile had a 46% higher ACSA rate—0.073 per beneficiary
(data not shown).
On average, ACSA rates differed by 36% between PPCs that admit to the same hospital.
Notice that the adjacency list states that a is tied to b and b is tied to a and c. The arc list merely lists all of the tied nodes or relationships.
Notice that the adjacency list states that a is tied to b and b is tied to a and c. The arc list merely lists all of the tied nodes or relationships.
Take-home story here is that position is unstable: lots of movement across positions.
It turns out that many observed networks have a characteristic involvement (degree) distribution, where a very small number of people have many ties, and most people have very few.
So most of us have 1 or 2 lifetime sex partners (far left of the graph), while a few NBA stars have many more (the far right).
This distribution is usually so skewed, that it makes sense to plot the histogram in log-log form, where the characteristic distribution then becomes clear. A power-law distribution often emerges, which has this functional form.
An empirical example of interest to population types: The sexual contact network among high-risk actors in colorado springs. Most (mainly johns) have only one partner, but a few high-activity pros have many.
695 actors represented
Longest path is 17 steps
Average distance is about 5 steps
Average person is within 3 steps of 75 other people
137 people connected through 2 independent paths, core of 30 people connected through 4 independent paths
To illustrate, consider these four networks, each with an identical volume of social ties.
The graphs become more difficult to separate, and the number of independent paths increase. This is illustrated in the far left, where we can always trace at least 3 completely independent paths between every pair of people in the net.
To illustrate, consider these four networks, each with an identical volume of social ties.
The graphs become more difficult to separate, and the number of independent paths increase. This is illustrated in the far left, where we can always trace at least 3 completely independent paths between every pair of people in the net.
Here we have box-plots for 4 different hierarchy models. The first two assume perfect Mutual-cliques within the tightes clusters, the last two assume chain-mutuallyi. These last 3 models are the best-fitting, over the pure “ranked-clusters of M Cliques” model. The main issue is that (a) we likley have less than perfect cliques within clusters and (b) we likely have multiple hierarchies…
The take-home here is that we have strong evidence from both the distribution of degree and the distribution of triads (and a long body of theory on schools!) that these settings are hierarchically ordered, with those receiving the most nominations at the “top” of the status hierarchy.