Linked Data for Digital Humanities research at Media Archives
1. Linked open data for digital humanities
research at media archives
Victor de Boer
Corpus Analysis of Time-Based Arts & Media
2.
3. Digital Humanities
Part of the effort of humanities researcher is moved
from the physical archives to digital ones
New possibilities for humanities research
Img:www.doaks.org, www.dkrz.de
4. Integrating collections as Linked Data
Tools built on top of the data
Continuous
enrichment
Embed in humanities methodology
Continuous collection enrichment
Multimedia analysis (image, text, video)
Human computation
28. Web service
Remove
stopwords
N-grams
frequency list
Extract subtitles
OAI program metadata
Dutch
stopword
lists
Dutch Word
frequencies
Named Entity
Recognition
Frequency > T
GTAA
thesaurus
(ElasticSearch)
NER
module
Normalized
Frequency > T
Match with
thesaurus
Match with
thesaurus
Results
Input
Other subtitle source
Term extraction from TT888 subtitles
29. Crowdsourcing: WAISDA? (What’s That?)
- Game-With-a-Purpose (GWAP)
- Allows internet users to annotate audiovisual archive
material in the form of a (serious) game
- The goal of the game is consensus between players
(which also works as a filter)
- Fun and competition as motivation
6-11-2018
34. RESULTS AND FINDINGS 1/2
Three implementations resulted in over a million social
tags (ongoing)
~40-50% of the social tags consists of matched tags
~10-20% of the social tags are unique
‘Super taggers’ are responsible for the vast majority of
the social tags that are added
6-11-2018
36. Crowd-/Nichesourcing: Filmtagging
• Provenance of annotations
(who, when, what)
• Simple expertise model
developed based on
interviews with film scholars
• Cinematography, Cultural
history, Locations
• Varying needs, crowdsourcing
does provide opportunities to
broaden the annotations
Aschwin Stacia, Vrije Universiteit Amsterdam
38. Nichesourcing: Accurator
• Annotating large amounts of media objects
– Rijksmuseum prints
• Beyond crowdsourcing
• Find niche groups with required expertise
• Use the knowledge of expert laymen
• Cluster tasks based on topic
Chris Dijkshoorn, Rijksmuseum Amsterdam
44. Linked Data
"Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net
47. Machine readable format
Standardized
Flexibility to connect heterogeneous data
Link what can be linked
re-use and re-usability
OBJECT EVENT
PLACE
TIME
PERSON
CONCEPT
PROVENANCE
Why Linked Open Data
52. Wikidata retrieval service for the
CLARIAH media suite
• Based on interviews with five
users of the Media Suite,
focusing on 1) Drugs, 2) Sports,
3) Occupations, 4) History and
5) Disruptive media events
• Exploratory search by
properties
• Send SPARQL query to Wikidata
Query Service
• Retrieve list of persons based
on properties
• View additional information
(Wikidata/GTAA)
• Exploratory search
55. ● Interviewees
● Four tasks,
○ Sports, politics, disruptive media events
● Share feedback
○ Discuss limitations
○ Propose improvements
● Added value for exploratory search
● Provides insight into background knowledge
● Participants report feeling grasping the context
● Data (in)completeness is a major issue
Validation
56. DIVE INTO THE EVENT-BASED
BROWSING OF LINKED HISTORICALMEDIA
58. Access to Integrated Online Multimedia collections
using Linked Open Data
Interactive Exploration & Discovery in Context
linking objects to events and entities
building automatic storylines (narratives)
DIVE+
62. FOUR DATA SOURCES
OPENIMAGES.EU
3,220 news broadcasts
Netherlands Institute for Sound & Vision
GTAA thesaurus
DELPHER.NL
197,199 Scans of Radio
bulletins
1937 – 1984
AMSTERDAM MUSEUM
73,447 cultural heritage objects
AM Thesaurus
TROPENMUSEUM
78,270 cultural heritage objects
SVNC thesaurus
65. Description Event
Foto is genomen tijdens de Eerste Zuid
Nieuw-Guinea Expeditie
Eerste Zuid Nieuw-
Guinea Expeditie
"Foto is genomen tijdens de Eerste- of
de Tweede Zuid Nieuw-Guinea
Expeditie"
Tweede Zuid Nieuw-
Guinea Expeditie
"Masker gedragen tijdens oogstfeesten.
Het feest in kwestie is het Sokari spel dat
eenmaal per jaar wordt opgevoerd
gedurende zeven opeenvolgende
nachten na Nieuwjaar, medio april. …” Nieuwjaar
FROG NLP toolkit NER Event extraction
Victor Kramer
https://languagemachines.github.io/frog/
67. DIVE+ Enrichments
Enrichment
method
Media
Objects Actors Places Events Other Alignments
OI Crowd + NER 3,204 1,249 1,412 1,916 185,846 623
NB Interpreted + NER 197,200 194,890 54,571 197,200 6,736 6,353
AM original thesaurus 73,447 66,966 5,973 148 28,047 6,865
TM
original thesaurus
+ FROG NER 78,226 27,829 3,896 23* 13,269 -
Total 352,077 290,934 65,852 199,264 233,898 -
*) more to come
69. DIVE+ UI: INFINITY OF EXPLORATION
/ http://diveplus.beeldengeluid.nl
/ Support exploration and serendipity /
/ Visual inspection of media objects and entities /
/ Lets user build, save and share Proto-Narratives/
70. Take home
Cultural Heritage Organisations (Libraries, Museums,
Archives) are becoming more Open, Smart, Connected
Continuous enrichment and linking of heterogeneous
collections brings new possibilities for access, analysis
Combination of semi-automatic methods, crowd- and
nichesourcing, expert annotations-> human(s) in the
loop.
Shift from tech push to user needs. We show added
value for a variety of users
Data completeness and Quality are (and will remain)
key
71. Thank you
Lora Aroyo
Roeland Ordelman
Tim de Bruyn
John Brooks
Willem Melder
Maarten Brinkerink
Jesse de Vos
Ashwin Stacia
Chris Dijkshoorn
73. For more detailed and technical
information:
• Connecting Dutch and Flemish thesaurus:
– http://www.victordeboer.com/digital-humanities/sound-and-vision/connecting-collections-
across-national-borders/ (also links to a scientific paper)
• Enriching videos from Subtitles:
– http://www.victordeboer.com/digital-humanities/sound-and-vision/paper-about-automatic-
labeling-in-ijdl/ (also links to a scientific paper)
• Waisda Game with a purpose:
– https://github.com/beeldengeluid/waisda
– http://blog.waisda.nl/
• Axes visual search (image recognition):
– http://labs.beeldengeluid.nl/application/6ade370a-1b50-11e5-b980-005056a71e3a
– https://github.com/kencoken/axes-lite
• DIVE+
– http://diveplus.beeldengeluid.nl
– http://diveproject.beeldengeluid.nl
• CLARIAH Media Suite
– http://mediasuite.clariah.nl
• More at http://labs.beeldengeluid.nl/