4. Introduction to
Network Analysis
What is a Network?
What is a Social Network?
Mathematical Representation of the
Relationships Between Units such as
Actors, Institutions, Software, etc.
Special class of graph Involving
Particular Units and Connections
6. Social Science
For Images and Links to
Underlying projects:
http://jhfowler.ucsd.edu/
3D HiDef SCOTUS Movie
Co-Sponsorship in Congress
Spread of Obesity
Hiring and Placement of
Political Science PhD’s
7. Social Science
The 2004 Political Blogosphere
(Adamic & Glance)
High School Friendship
(Moody)
Roll Call Votes in United States Congress
(Mucha, et al)
14. Example: Nodes in an actor-
based social Network
Alice
Bill
Carrie
David
Ellen
How Can We Represent The
Relevant Social Relationships?
Terminology & Examples
21. A Survey Based Example
“Which of the above individuals
do you consider a close friend?”
Image We Surveyed 5 Actors:
(1) Daniel,
(2) Jennifer,
(3) Josh,
(4) Bill,
(5) Larry
22. From an EdgeList to Matrix
1 2 3 4 5
---------------------------
Daniel (1) 0 1 1 1 1
Jennifer (2) 1 0 1 0 0
Josh (3) 0 1 0 1 1
Bill (4) 0 0 0 0 0
Larry (5) 1 1 1 1 0
*Directed Connections (Arcs) 13
1 2
1 3
1 4
1 5
2 1
2 3
3 4
3 5
3 2
5 1
5 4
5 3
5 2
ROWS è COLUMNS
*How to Read the Edge List: (Person in Column 1 is friends with Person in Column 2)
23. 1 2 3 4 5
---------------------------
Daniel (1) 0 1 1 1 1
Jennifer (2) 1 0 1 0 0
Josh (3) 0 1 0 1 1
Bill (4) 0 0 0 0 0
Larry (5) 1 1 1 1 0
From a Survey
to a Network
24. A Quick Law Based
Example of a
Dynamic Network
25. United States Supreme Court
To Play Movie of the Early SCOTUS Jurisprudence:
http://vimeo.com/9427420
Documentation is Available Here:
http://computationallegalstudies.com/2010/02/11/the-development-of-structure-in-the-citation-network-of-the-
united-states-supreme-court-now-in-hd/
35. The Origin of Network
Science is Graph Theory
The Königsberg Bridge Problem
the first theorem in graph theory
Is It Possible to cross each bridge
each and only once?
36. The Königsberg Bridge Problem
Leonhard Euler
(Pronounced Oil-er)
proved that this
was not possible
Is It Possible to
cross each bridge
each and only once?
37. Eulerian and
Hamiltonian Paths
Eulerian path: traverse
each edge exactly once
If starting point and end point are the same:
only possible if no nodes have an odd degree
each path must visit and leave each shore
If don’t need to return to starting point
can have 0 or 2 nodes with an odd degree
Hamiltonian path: visit
each vertex exactly once
39. Moreno, Heider, et. al.
and the Early Scholarship
Focused Upon Determining the Manner in
Which Society was Organized
Developed early techniques to represent the
social world Sociogram/ Sociograph
Obviously did not
have access to
modern computing
power
40. Stanley Milgram’s
Other Experiment
Milgram was interested in the
structure of society
Including the social distance
between individuals
While the term “six degrees” is often
attributed to milgram it can be traced to ideas
from hungarian author Frigyes Karinthy
What is the average distance
between two individuals in
society?
42. Six Degrees of Separation?
NE
MA
Target person worked in Boston as a stockbroker
296 senders from Boston and Omaha.
20% of senders reached target.
Average chain length = 6.5.
And So the term ...
“Six degrees of Separation”
43. Six Degrees
Six Degrees is a claim that “average path
length” between two individuals in society
is ~ 6
The idea of ‘Six Degrees’ Popularized
through plays/movies and the kevin
bacon game
http://oracleofbacon.org/
46. But What is Wrong
with Milgram’s Logic?
150(150) = 22,500
150 3 = 3,375,000
150 4 = 506,250,000
150 5= 75,937,500,000
47. The Strength of ‘Weak’ Ties
Does Milgram get
it right? (Mark Granovetter)
Visualization Source: Early Friendster – MIT Network
www.visualcomplexity.com
Strong and Weak Ties
(Clustered
v.
Spanning)
Clustering ----
My Friends’ Friends
are also likely to
be friends
48. So Was Milgram Correct?
Small Worlds (i.e. Six Degrees) was a theoretical
and an empirical Claim
The Theoretical Account Was Incorrect
The Empirical Claim was still intact
Query as to how could real social networks
display both small worlds and clustering?
At the Same time, the Strength of Weak Ties was
also an Theoretical and Empirical proposition
49. Watts and Strogatz (1998)
A few random links in an otherwise clustered
graph yields the types of small world
properties found by Milgram
“Randomness” is key bridge between the small
world result and the clustering that is
commonly observed in real social networks
50. Watts and Strogatz (1998)
A Small Amount of Random Rewiring or
Something akin to Weak Ties—Allows for
Clustering and Small Worlds
Random Graphlocally Clustered
53. The Milgram Experiment
How did the successful subjects actually
succeed?
How did they manage to get the envelope
from nebraska to boston?
this is a question regarding how
individuals conduct searches in their
networks
Given most individuals do not know the
path to distantly linked individuals
54. Search in Networks
Most individuals do not know the path to
an individual who is many hops away
Must rely on some sort of heuristic rules
to determine the possible path
55. Search in Networks
What information about the problem might
the individual attempt to leverage?
visual by duncan watts
dimensional data:
send it to a stockbroker
send it to closet possible city to boston
56. Follow up to
the original
Experiment
available at:
http://research.yahoo.com/pub/2397
Published in
Science in 2003
63. Shortest Paths
Shortest Paths
The shortest set of links
connecting two nodes
Also, known as the geodesic path
In many graphs, there are multiple
shortest paths
64. Shortest Paths
Shortest Paths
A and C are connected by
2 shortest paths
A – E – B - C
A – E – D - C
Diameter: the largest geodesic distance
in the graph
The distance between A and C is
the maximum for the graph: 3
65. Shortest Paths
I n t h e W a t t s - S t r o g a t z M o d e l
Shortest Paths are reduced by
increasing levels of random rewiring
67. Density
Density = Of the connections
that could exist between n nodes
directed graph: emax = n*(n-1)!
(each of the n nodes can connect to (n-1) other nodes)
undirected graph emax = n*(n-1)/2
(since edges are undirected, count each one only once)
What Fraction are Present?
68. Density
What fraction are present?
density = e / emax
For example, out of 12
possible connections..
this graph
this graph has 7,
giving it a density of
7/12 = 0.58
A “fully connected graph has a density =1
69. Connected Components
We are often interested in whether
the graph has a single or multiple
connected components
Strong Components
Giant Component
Weak Components
70. Netlogo
Basic Simulation
Platform for Agent
Based Modeling &
Simple Network
Simulation
http://ccl.northwestern.edu/netlogo/
Wilensky (1999)
HIV / VOTING Hawk/Dove
(A Classic from
Evolutionary Game Theory)
71. Netlogo
Please DownLoad Netlogo as we
will be using it occasionally
throughout this tutorial
http://ccl.northwestern.edu/netlogo/
Wilensky (1999)
76. Degree Distributions
outdegree
how many directed edges (arcs)
originate at a node
indegree
how many directed edges (arcs) are
incident on a node
degree (in or out)
number of edges incident on a node
Indegree=3
Outdegree=2
Degree=5
77. Node Degree
from
Matrix Values
Outdegree:
outdegree for node 3 = 2,
which we obtain by summing
the number of non-zero
entries in the 3rd row
Indegree:
indegree for node 3 = 1,
which we obtain by summing
the number of non-zero
entries in the 3rd column
78. Degree Distributions
These are Degree Count for particular nodes
but we are also interested in the distribution
of arcs (or edges) across all nodes
These Distributions are called “degree
distributions”
Degree distribution: A frequency count of
the occurrence of each degree
80. Degree Distributions
Imagine we have this 8 node network:
In-degree distribution:
[(2,3) (1,4) (0,1)]
Out-degree distribution:
[(2,4) (1,3) (0,1)]
(undirected) distribution:
[(3,3) (2,2) (1,3)]
81. Why are Degree
Distributions Useful?
They are the signature of a dynamic process
We will discuss in greater detail tomorrow
Consider several canonical network models
90. Readings on Power law /
Scale free Networks
Check out Lada Adamic’s Power Law Tutorial
Describes distinctions between the Zipf,
Power-law and Pareto distribution
http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html
This is the original paper that gave rise to
all of the other power law networks papers:
A.-L. Barabási & R. Albert, Emergence of scaling in random
networks, Science 286, 509–512 (1999)
93. How Do I Know Something
is Actually a Power Law?
94. Clauset, Shalizi & Newman
http://arxiv.org/abs/0706.1062
argues for the use of MLE
instead of linear regression
Demonstrates that a number
of prior papers mistakenly
called their distribution a
power law
Here is why you should use
Maximum Likelihood Estimation
(MLE) instead of linear
regression
You recover the power law
when its present
Notice spread between the
Yellow and red lines
95. Back to the Random Graph
Models for a Moment
Poisson distribution
Erdos-Renyi is the default random
graph model:
randomly draw E edges
between N nodes
There are no hubs in the network
Rather, there exists a narrow
distribution of connectivities
96. Back to the Random Graph
Models for a Moment
let there be n people
p is the probability that any two of them are ‘friends’
Binomial Poisson Normal
limit p small Limit large n
98. Generating Power Law
Distributed Networks
Pseudocode for the growing power law networks:
Start with small number of nodes
add new vertices one by one
each new edge connects to an existing vertex in
proportion to the number of edges that vertex
already displays (i.e. preferentially attach)
99. Growing Power Law
Distributed Networks
The previous pseudocode is not a unique solution
A variety of other growth dynamics are possible
In the simple case this is a system that extremely
“sensitive to initial conditions”
upstarts who garner early advantage are able to
extend their relative advantage in later periods
for example, imagine you receive a higher interest
rate the more money you have “rich get richer”
100. Just To Preview The
Application to Positive
Legal Theory ....
101. Power Laws Appear to be a
Common Feature of Legal Systems
Katz, et al (2011)
American Legal Academy
Katz & Stafford (2010)
American Federal Judges
Geist (2009)
Austrian Supreme Court
Smith (2007)
U.S. Supreme Court
Smith (2007)
U.S. Law Reviews
Post & Eisen (2000)
NY Ct of Appeals
104. Node Level Measures
Sociologists have long been interested in
roles / positions that various nodes occupy with
in networks
For example various centrality measures
have been developed
Degree
Closeness
Here is a non-exhaustive List:
Betweenness
Hubs/Authorities
105. Degree
Degree is simply a count of the number of
arcs (or edges) incident to a node
Here the nodes are sized by degree:
106. Degree as a measure
of centrality
Please Calculate the “degree” of each of the nodes
107. Degree as a measure
of centrality
ask yourself, in which case does “degree” appear
to capture the most important actors?
108. Degree as a measure
of centrality
what about here, does it capture the “center”?
109. Closeness Centrality
Closeness is based on the inverse of the
distance of each actor to every other
actor in the network
Closeness Formula:
Normalized Closeness Formula:
112. Betweenness Centrality
Idea is related to
bridges, weak ties
This individual may
serve an important
function
Betweenness
centrality counts
the number of
geodesic paths
between i & k that
actor j resides on
114. Betweenness Centrality
Check these yourself:
gjk = the number of
geodesics connecting j &
k, and
gjk = the number that
actor i is on
Note: there is also a normalized
version of the formula
115. Betweenness Centrality
Betweenness is a very
powerful concept
We will return when we discuss
community detection in
networks ... If you want to
preview check out this paper:
Michelle Girvan & Mark Newman, Community
structure in social and biological networks,
Proc. Natl. Acad. Sci. USA 99, 7821–7826
(2002)
High Betweenness actors need
not be actors that score high on
other centrality measures (such
as degree, etc.)
[see picture to the right]
116. Hubs and Authorities
The Hubs and Authorities Algorithm
(HITS) was developed by Computer
Scientist Jon Kleinberg
Similar to the Google “PageRank”
Algorithm developed by Larry Page
Kleinberg is a MacArthur Fellow and
has offered a number of major
contributions
117. Hubs and Authorities
We are interested in BOTH:
to whom a webpage links
and
From whom it has received links
In Ranking a Webpage ...
118. Hubs and Authorities
Intuition --
If we are trying to rank a webpage
having a link from the New York
Times is more of than one from a
random person’s blog
HITS offers a significant improvement
over measuring degree as degree treats
all connections as equally valuable
119. Hubs and Authorities
Relies upon ideas such as recursion
Measure who is important?
Measure who is important to who
is important?
Measure who is important to who
is important to who is important ?
Etc.
120. Hubs and Authorities
Hubs: Hubs are highly-valued lists for
a given query
for example, a directory page from a major encyclopedia
or paper that links to many different highly-linked pages
would typically have a higher hub score than a page that
links to relatively few other sources.
Authority: Authorities are highly
endorsed answers to a query
A page that is particularly popular and linked by many
different directories will typically have a higher
authority score than a page that is unpopular.
Note: A Given WebPage could be both a hub and an authority
121. Hubs and Authorities
Hubs and Authorities has been used in a
wide number of social science articles
There exists some variants of the
Original HITS Algorithm
Here is the Original Article :
Jon Kleinberg, Authoritative sources in a
hyperlinked environment, Journal of the
Association of Computing Machinery, 46 (5):
604–632 (1999).
Note: there is a 1998 edition as well
122. Calculating Centrality
Measures
Thankfully, centrality measures, etc. need not be
calculated by hand
Lots of software packages ...
in increasing levels of difficulty ... left to right
Difference in functions, etc. across the packages
easy: accepts
microsoft
excel files
Medium: requires
the .net / .paj
file setup
Hard: has lots of
features
(R or Python)
123.
124. Advanced Network Science Topics
Community Detection
ERGM Models
Diffusion /
Social Epidemiology
http://computationallegalstudies.com/2009/10/11/
programming-dynamic-models-in-python/
166. 6,778%2*0(9'*'&:,%(T'#2'+(!-:&/'1(
0<CE)'FEOV?)$EXHES)6;RP?EFA))
)
Mason A. Porter, Jukka-Pekka Onnela and Peter J. Mucha. 2009.
Communities in Networks. Notices of the American Mathematical Society
56: 1082-1166.
(
(
Santo Forunato. 2010. Community detection in graphs. Physics Reports.
486: 75-174.(
32&4$'/(LF(M,77$-2*,(NND(9$%2'/(3$-:%(O$*P(
187. Network Analysis & Law
Mapping Social Structure of Legal Elites
(hustle & Flow Article)
Diffusion, Norm Adoption and other
Related Processes
(JLE Article)
Legal Doctrine and Legal Rules
(Sinks Paper with Application to
Patents, Legal Doctrine, etc.)
194. Collected Nearly 19,000 Law Clerk ‘Events’
1995 - 2005 For All Article III Judges
Relying Upon Data From Staff Directories
Network Analysis of
the Federal Judiciary
195. The Core Claim
In the Aggregate ...
Law Clerk Movements Reveal
Between Judicial Actors
Social or Professional Relationships
196. Network Analysis of
the Federal Judiciary
Judge E
Justice ZJustice Y
Judge C
Judge D
Judge B
Judge A
205. Reproduction of Hierarchy?
A Social Network Analysis of the
American Law Professoriate
Daniel Martin Katz
Josh Gubler
Jon Zelner
Michael Bommarito
Eric Provins
Eitan Ingall
206. Motivation for Project
Why Do Certain Paradigms, Histories, Ideas Succeed?
Function of the ‘Quality’ of the Idea
Social Factors also Influence the Spread of Ideas
Most Ideas Do Not Persist ....
207. Law Professors are Important Actors
Agents of Socialization
Repositories / Distributors of information
Socialize Future lawyers, Judges & law Professors
Responsible for Developing Particular Legal Ideas
(Brandwein (2007) ; Graber (1991), etc.)
Law Professor Behavior is a Important
Component of Positive Legal Theory
Positive Legal Theory
208. Social Network Analysis
Method for Characterizing Diffusion / Info Flow
Method for Tracking Social Connections, etc.
Method for Ranking Components based
upon Various Graph Based Measures
230. Hub Score
Score Each Institution’s Placements by
Number and Quality of Links
Normalized Score (0, 1]
Similar to the Google PageRank™ Algorithm
Measure who is important?
Measure who is important to who is important?
Run Analysis Recursively...
239. 0
200
400
600
800
1,000
Harvard Yale Michigan Columbia Chicago NYU Stanford Berkeley UVA GeorgetownPennNorthwesternTexas Duke UCLA CornellWisconsin BU IllinoisMinnesota
Top 20 Institutions
(By Raw Placements)
240.
241. Highly Skewed Nature of
Legal Systems
Smith 2007
Post & Eisen 2000Katz & Stafford 2010
!
242. Implications for Rankings
Rankings only Imply Ordering ( >, =, < )
End Users tend to Conflate Ranks with
Linearized Distances Between Units
(Tversky 1977)
Non-Stationary Distances Between Entities
Both Trivial and Large Distances
Linearity Heuristic Often Works
Assuming Linearity Can Prove Misleading
244. Why Computational
Simulation?
History only Provides a Single Model Run
Computational Simulation allows ...
Consideration of Alternative “States of the world”
Evaluation of Counterfactuals
245. Computational Model of
Information Diffusion
We Apply a simple Disease Model to
Consider the Spread of Ideas, etc.
Clear Tradeoff Between Structural Position
in the Network and “Idea Infectiousness”
246. A Basic Description
of the Model
Consider a Hypothetical Idea Released
at a Given Institution
Infectiousness Probability = p
Two Forms Diffusion...
Direct Socialization
Signal Giving to Former Students
Infect neighbors, neighbors-neighbors, etc.
247. Lots of Channels of Information Diffusion
Among Legal Academics
Judicial Decisions, Law Reviews, Other Materials
Academic Conferences, Other Professional Orgs
SSRN, Legal Blogosphere, etc.
Channels of Diffusion
Other Channels of Information Dissemination
Legal Socialization / Training
252. Run a Simulation
on Your Desktop
http://computationallegalstudies.com/2009/04/22/the-revolution-will-not-be-televised-but-will-it-
come-from-harvard-or-yale-a-network-analysis-of-the-american-law-professoriate-part-iii/
(Requires Java 5.0 or Higher)
253. From a Single Run to
Consensus Diffusion Plot
Netlogo is Good for Model Demonstration
Regular Programming Language Typically
Required for Full Scale Implementation
We Used Python
http://ccl.northwestern.edu/netlogo/
http://www.python.org/
Object Oriented Programming Language
254. From a Single Run to
Consensus Diffusion Plot
Repeated the Diffusion Simulation
Hundreds of Model Runs Per School
Yielded a Consensus Plot for Each School
Results for Five Emblematic Schools
Exponential, linear and sub-linear
256. Differential Host Susceptibility
Some Potential
Model Improvements?
Countervailing Information / Paradigms
S I R Model Susceptible-Infected-Recovered
257. Directions for
Future Research
Longitudinal Data
Hiring/Placement/Laterals
Current Collecting Data
Database Linkage to Articles/Citations
Working with Content Providers
Empirical Evaluation of Simulation
Computational Lingusitics
Text Mining, Sentiment Coding
258.
259. Example Project #3:
On the Road to the
Legal Genome Project ...
Dynamic Community Detection
&
Distance Measures for
Dynamic Citation Networks
260. Distance Measures for
Dynamic Citation Networks
Michael J. Bommarito II
Daniel Martin Katz
Jon Zelner
James H. Fowler
266. How Can We
Track the Novel
Combination,
Mutation and
Spread of Ideas?
267. Information Genome Project
The Development, Mutation
and and Spread of Ideas
Precedent in Common Law Systems
Patent Citations
Bibliometric Analysis
291. Cases Decided by
the Supreme Court
Citations in the
Current Year
Citations from
prior years
PLAY MOVIE!
http://computationallegalstudies.com/
2010/02/11/the-development-of-structure-in-
the-citation-network-of-the-united-states-
supreme-court-now-in-hd/
308. Legal Analytics
Class 11 - Network Analysis + Law
daniel martin katz
blog | ComputationalLegalStudies
corp | LexPredict
michael j bommarito
twitter | @computational
blog | ComputationalLegalStudies
corp | LexPredict
twitter | @mjbommar
more content available at legalanalyticscourse.com
site | danielmartinkatz.com site | bommaritollc.com