The MediaWiki platform supports popular socio-technical systems such as Wikipedia as well as thousands of other wikis. This software encodes and records a variety of relationships about the content, history, and editors of its articles such as hyperlinks between articles, discussions among editors, and editing histories. These relationships can be analyzed using standard techniques from social network analysis, however, extracting relational data from Wikipedia has traditionally required specialized knowledge of its API, information retrieval, network analysis, and data visualization that has inhibited scholarly analysis. We present a software library called the NodeXL MediaWiki Importer that extracts a variety of relationships from the MediaWiki API and integrates with the popular NodeXL network analysis and visualization software. This library allows users to query and extract a variety of multidimensional relationships from any MediaWiki installation with a publicly-accessible API. We present a case study examining the similarities and differences between dierent relationships for the Wikipedia articles about "Pope Francis" and "Social media." We conclude by discussing the implications this library has for both theoretical and methodological research as well as community management and outline future work to expand the capabilities of the library.
Driving Behavioral Change for Information Management through Data-Driven Gree...
Analyzing Multidimensional Networks within MediaWikis
1. Analyzing Multidimensional
Networks within MediaWikis!
WikiSym 2013!
Hong Kong, China!
August 7, 2013!
Brian Keegan, Ph.D.
@bkeegan
Arber Ceni Marc A. Smith, Ph.D.
@marc_smith
3. Motivation!
• Collaboration is fundamentally
relational!
• Use network analysis methods to
understand success of wikis!!
• A variety of MediaWiki meta-data
accessible through API are relational!
• Build on top of existing network
analysis package to simplify retrieval,
structuring, cleanup, and visualization!
3
5. User-Object relationships!
• Editing!
• user e makes a revision to article a!
• Watchlist!
• user e has article a on watchlist!
• Affiliation!
• user e is a member of project a!
5
e
a
6. Undirected User-User relationships!
• Co-authorship!
• e1 and e2 edited the same article !
• Co-affiliation!
• e1 and e2 are members of the same project!
6
e1
e2
7. Directed User-User relationships!
• Discussion!
• e1 left a message on e2’s talk page !
• Article trajectory!
• e2 modified the article after e1!
7
e1
e2
8. Undirected Object-Object relationships!
• Shared authorship!
• a1 and a2 were edited by the same users!
• Category co-membership!
• a1 and a2 are members of the same categories!
8
a2
a1
10. Multidimensional networks!
• Multiple types of links between nodes!
• Hyperlink!
• Shared authorship!
• Category co-membership!
• Presence of overlapping ties may explain
collaboration more richly!
• Absence of overlapping ties may reveal
anomalies for follow-on analysis! 10
a2
a1
13. NodeXL Platform!
• https://nodexl.codeplex.com/!
• Lower barriers to entry by using spreadsheet workflows!
• Network analysis plug-in for Microsoft Excel!
• “Spigots” to import network data from Twitter, Facebook,
Flickr, Email, YouTube, and WWW!
13
14. NodeXL MediaWiki Importer!
• https://wikiimporter.codeplex.com/!
• Graph data provider for NodeXL à new “spigot”!
• Queries MediaWiki API through DotNetWikiBot
framework!
• Given a Page and a Site, returns a PageList!
14
16. Case Study!
• Compare the structures of different relationships across
two types of English Wikipedia articles!
• “Social media”!
• “Pope Francis”!
• Node layout via “Harel-Koren Fast Multiscale”!
• Spring-embedding layout to emphasize clusters of ties!
• Nodes grouped via “Clauset-Newman-Moore”!
• Nodes assigned to group if more ties within group than outside!
• “Group-in-a-box” layout!
• Ties within group visualized individually, ties between groups
collapsed together!
16
19. 19
User discussion!
Pope Francis! Social media!
Nodes are editors who contributed to article
Links together if they left messages on other users’ talk
20. 20
Shared authorship!
Pope Francis! Social media!
Nodes are other articles edited by the users who contributed to article
Links together if they share multiple co-authors
23. Discussion!
• Wikipedia and other MediaWiki projects contain a variety
of complex and multidimensional relationships among
users and objects!
• NodeXL MediaWiki Importer is a tool for simplifying
complex data extraction and analysis workflows!
• NodeXL provides a powerful suite of tools to analyze and
visualize the structure of multidimensional relationships!
• Empirical testing of social theories as well as diagnosing
the health of online communities!
23