Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Preserving Born-Digital News Panel JCDL 2016
1. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
Panel: Preserving Born-Digital News
Collecting, Analyzing, and Linking
TV News and
Social Media Collections
Peter Broadwell
@peterbroadwell
Martin Klein
@mart1nkle1n
University of California Los Angeles
Research Library
2. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
• Iranian Green Movement
• Tahrir Square Unrest
• Zanzibar Riots
• Israel, South Africa, Argentina, Cuba, Armenia, Ukraine
2
International Digitizing Ephemera Project
http://digital.library.ucla.edu/dep/
3. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
• Tahrir Square Egypt & Libya unrest, 2011
• Tōhoku earthquake and tsunami, Japan, 2011
• AirAsia 8501 crash, December 2014
• Charlie Hebdo shooting, January 2015
• GOP and Democratic Party presidential debates 2015/16
3
Collecting Social Media - Tweets
4. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
• Tahrir Square Egypt & Libya unrest, 2011
• Tōhoku earthquake and tsunami, Japan, 2011
• AirAsia 8501 crash, December 2014
• Charlie Hebdo shooting, January 2015
• GOP and Democratic Party presidential debates 2015/16
4
Social Feed Manager
http://social-feed-manager.readthedocs.org/
Collecting Social Media - Tweets
5. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
Collecting TV News - NewsScape
5
• 289,174 hours of TV news archived digitally
• Recorded 2005-present, ca. 145 shows/day
• 46 networks, 13 countries, 9 languages
• Searchable by captions, official transcripts, on-screen text
• 3.55 billion words indexed
6. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
6
Collecting TV News - NewsScape
• 289,174 hours of TV news archived digitally
• Recorded 2005-present, ca. 145 shows/day
• 46 networks, 13 countries, 9 languages
• Searchable by captions, official transcripts, on-screen text
• 3.55 billion words indexed
7. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
Linking TV News and Social Media
7
8. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
Linking TV News and Social Media
8
9. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
Linking TV News and Social Media
9
10. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
10
CNN
09/16/2015
05:22pm
11. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
11
CNN
09/16/2015
05:22pm
Twitter
09/16/2015
06:22pm
12. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
Linking via Automated Entity Detection
• Discover and highlight commonalities and relationships
between disjoint collections on related news events
• Link to authorities
• Address problem of disambiguation
• Improve discoverability and reusability
12
13. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
13
14. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
14
15. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
Experimental Exploration
• Apply DBpedia Spotlight Named Entity Recognition
(NER) software to collections on second GOP
presidential primary debate on 09/16/2015
• Twitter: 800,000 tweets
• TV: CNN coverage of debate
• Minute granularity
• Persons, Organizations, Places
Results:
• Linked entities with URIs to DBpedia resources
• Visualization of correlations between entities
15
16. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
16
Persons0200400600800
0246810
23:00
23:15
23:30
23:45
00:00
00:15
00:30
00:45
01:00
01:15
01:30
01:45
02:00
02:15
02:30
02:45
03:00
03:15
03:30
03:45
03:59
Trump Twitter
Trump NewsScape
http://sologlo.library.ucla.edu/visualizations/gop_debate/rickshaw/graphs/gop_persons.html
Twitter
NewsScape
17. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
17
Places0102030405060
0123456
23:00
23:15
23:30
23:45
00:00
00:15
00:30
00:45
01:00
01:15
01:30
01:45
02:00
02:15
02:30
02:45
03:00
03:15
03:30
03:45
03:59
Canada Twitter
Canda NewsScape
http://sologlo.library.ucla.edu/visualizations/gop_debate/rickshaw/graphs/gop_places.html
Twitter
NewsScape
18. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
18
Persons0100200300400
0246810
23:00
23:15
23:30
23:45
00:00
00:15
00:30
00:45
01:00
01:15
01:30
01:45
02:00
02:15
02:30
02:45
03:00
03:15
03:30
03:45
03:59
Fiorina Twitter
Fiorina NewsScape
http://sologlo.library.ucla.edu/visualizations/gop_debate/rickshaw/graphs/gop_persons.html
Twitter
NewsScape
19. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
19
Organizations050100150200250300
0246810
23:00
23:15
23:30
23:45
00:00
00:15
00:30
00:45
01:00
01:15
01:30
01:45
02:00
02:15
02:30
02:45
03:00
03:15
03:30
03:45
03:59
PP Twitter
PP NewsScape
HP Twitter
HP NewsScape
http://sologlo.library.ucla.edu/visualizations/gop_debate/rickshaw/graphs/gop_orgs.html
Twitter
NewsScape
20. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
20
http://dbpedia.org/resource/Ronald_Reagan
21. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
21
http://dbpedia.org/resource/Ronald_Reagan
22. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
22
23. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
23
Hashtags
http://sologlo.library.ucla.edu/visualizations/gop_debate/rickshaw/graphs/gop_orgs_hashtags.html
24. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
24
#DebateWithBernie
http://sologlo.library.ucla.edu/visualizations/gop_debate/rickshaw/graphs/gop_orgs_hashtags.html
25. Collecting, Analyzing, and Linking TV News and
Social Media Collections
@mart1nkle1n
#jcdl2016 Newark, NJ, 06/21/2016
Panel: Preserving Born-Digital News
Collecting, Analyzing, and Linking
TV News and
Social Media Collections
Peter Broadwell
@peterbroadwell
Martin Klein
@mart1nkle1n
University of California Los Angeles
Research Library
Hinweis der Redaktion
Gain a better understanding of
what we got
maybe even what we did not get
how the pieces may fit together and can be connected
Do this in an automated fashion
Make this discoverable