Pruning cooccurrence networks

Pruning cooccurrence networks
Raf Guns
raf.guns@uantwerpen.be

Density problem
Dense networks are hard to
 visualize
 interpret
Solution: pruning networks
 PathFinder (Schvaneveldt, 1990)
 Deleting low-weight links (De Nooy, Mrvar, and Batagelj, 2005)
 Cocitation and bibliographic coupling (Persson, 2010)
 Threshold for cosine values (Leydesdorff, 2007; Egghe &
Leydesdorff, 2009)

Cooccurrence networks
E.g. cocitation, bibliographic coupling, coauthorship…
Especially prone to density problem
Two-mode network Cooccurrence network
e.g., authors
e.g., citing
papers

Steps
Based on Zweig and Kaufman (2011): we start from two-mode
network
1. Define pattern of interest
2. Determine interestingness of cooccurrence
3. If cooccurrence is interesting, authors are linked

Why interestingness?
Highly cited author
 High coocurrence counts with many other authors
Citing paper referring to many authors under consideration
 Resulting cooccurrences are less important

Determining interestingness
Here:
How to determine Exp and σ?
 Estimate by sampling from Fixed Degree Sequence Model
(FDSM): all two-mode networks with same node degrees
 Markov Chain Monte Carlo simulation: link swapping
 If p < 0.0001 (or z > 3.29) , we consider link interesting

Link swapping
e.g., authors
e.g., citing
papers

Author cocitation
Author (co-)citations to
 12 authors from bibliometrics
 12 authors from information retrieval
in Scientometrics and JASIS, 1996-2000
Same data set studied by
 Ahlgren, Jarneving & Rousseau (2003)
 Egghe & Leydesdorff (2009)
 Leydesdorff & Vaughan (2006)

Author cocitations: FDSM and z-scores

Bibliographic coupling
Bibliographic coupling of all JASIST articles, 1999-2000
 n = 371
 12 981 unique references
Two VOSviewer maps
 cosine normalization
 FDSM and z-scores

Bibliographic coupling: cosine

Bibliographic coupling: FDSM and z-scores

Conclusions
Advantages
1. Both positive and negative cooccurrences
2. Thresholds correspond to specific p-values
3. Accounts for degree variations of bottom nodes
Disadvantages
1. Some nodes may become isolates
2. More computationally intensive than cosine similarity

Pruning cooccurrence networks

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Pruning cooccurrence networks

Similar to Pruning cooccurrence networks (20)

Recently uploaded

Recently uploaded (20)

Pruning cooccurrence networks

Editor's Notes