3. Bursts in social networks
ï€ Bursts of edits on Wikipedia in particular
ï€ When do those occur?
4. What can we learn by looking at
spikes in edit frequency?
ï€ How have edit spikes changed over Wikipediaâs ten
years of existence?
ï€ Does the size of an edit spike correlate to anything?
12. Regular Expressions (Regex)
Perl script uses regular expressions to find and
output matching pieces of text.
In this case, I am pulling out dates in Wikipediaâs
day month year format and re-writing them in a
more machine-readable MM/DD/YYYY format.
11/08/2011
13. Data manipulation
Copy/pase the revision history of wiki
pages into a text document which I
feed to my perl script
Results in lists consisting of one date
per edit that occurred on that date
Copying/pasting isnât super
elegant, but I havenât gotten
LWP/useragent stuff to work yet
14. Excel!
ï€ Throw my lists of dates into a pivot table, which
shows me the frequency that each date occurs
ï€ Some vlookup magic allows me to combine
these edit frequencies of individual actors into
one big list covering every day from 6/1/2001 to
the present
16. Problems
9 actors over 10 years means close to 100k cells
Excel is not built for speed
Matlab might work better
17. What does the data look like over
time?
ï€ 6/1-5/31 from 2001 (when Wikipediaâs current edit no.âs
begin) to 2010 (when all of the bursts have settled down)
40. If we tweak the data to take
importance into considerationâŠ
ï€ Average gross, adjusted for inflation*
ï€ Only available for a small amount of actors chosen in the
sample set
ï€ Taken from boxofficemojo.com
ï€ Extremely reliable source
45. -10 days to +40 days (log)
3
2.5
2 coburn log
peck log
brando log
1.5 davis log
palance log
goulet log
1 ledger log
swayze log
0.5
0
1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950
46. Other things I should consider
ï€ Age at death
ï€ Cause of death
ï€ Were they still acting?
47. Future directions
ï€ New sample of Wikipedia pages
ï€ Need to compare more contemporary pages
ï€ Need new metrics for comparison
ï€ Better workflows