SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Linking thesauri
About me
Matthias Priem
VIAA
Manager Archiving
Over VIAA
Not (only) a
Broadcast
Archive
CULTURAL
HERITAGE
We don’t
own
collections
Currently:
+/- 100 orgs
CITY
ARCHIVES BROADCAST
SERVICE
PROVIDER
DIGITISATION
VIAA FACTS & FIGURES
●  Total carriers identified: 650.000
●  Total carriers digitised today: 227.000
●  About 30% done (but most projects are started)
DIGITAL
ARCHIVE
SYSTEM
VIAA FACTS & FIGURES
●  Archive today:
80.000+ hours (2 PB)
●  Growing 15+ TB/day
(that’s almost 0,5 PB / month)
VIAA FACTS & FIGURES
●  Archive today: +80.000
hours / 2 PB
●  Growing 15TB/day
(that’s 0.5 PB / month)
EDUCATION
VIAA FACTS & FIGURES in 50% of all
schools
(in < 1 year)
VIAA FACTS & FIGURES in 50% of all
schools
Fiat/IFTA
nominee
VIAA and thesauri
●  VIAA archives collections of
more than 100 organisations
●  Multiple sectors with large
differences
●  Feasibility study in 2014
“Can we build one ‘unified’
thesaurus and if so: how?”
Feasibility study: main outcomes
●  One generic thesaurus won’t work
○  Huge discussions on content
○  Very niche thesauri vs. more generalist
●  But
○  What about SKOS
○  And what about linking thesauri
What about SKOS?
●  Standardized way for knowledge organisation
●  W3C, semantic web
●  Main contents
○  Concepts with descriptions
○  Hierarchy
○  Alternative names
○  Multilingual
○  ...and links: exactMatch, ...
What about linking thesauri?
voc1
voc2
•  Allows people to work on their own
thesaurus
•  Benefit from each others work
•  Allow unified search
But linking
manually…?
About Sound & Vision
Audiovisual archief van
Nederland
> 800.000 uur materiaal
(tv, radio, muziek, docu,
film, commercials, etc.)
Archive + Access
GTAA!!
“Archive as Lab”
●  Smart
●  Connected
●  Open
About Taalunie
●  Support of Dutch
around the globe
●  Also active in the
field of digital
heritage
Linked Thesaurus for Uniform Dissemination
●  Source selection
●  SKOSification
●  Alignment (linking of the sources)
●  Demonstrator
With a little help from our friends
Source thesauri
GTAA (SKOS)
VRT Thesaurus
(not SKOS.. yet)
VRT Thesaurus
100.000+ terms
Structured
Every term has an ID
Not standardized
Not published
Managed as part of AVID
Bit of a ‘black box’
VRT Thesaurus
Relations in the thesaurus
VRT Thesaurus
Hierarchical relations, to SKOS
(‘broader’ en ‘narrower’)
Also: related terms
Amsterdam in SKOS (turtle)
VRT Thesaurus: scopeNotes
Synonyms
VRT Thesaurus after SKOSification
102.172 terms (concepts)
97.744 terms are linked hierarchically
4.429 topConcepts
212 scopeNotes
6828 relations between terms
GTAA
184.484 termen
19.695 terms are linked hierarchically
9 conceptSchemes (persons, locations, …)
90.708 scopeNotes
33.542 relaties
=> published as linked open data J
Linking thesauri together
http://CultuurLINK.beeldengeluid.nl
Linking thesauri
Start working with two thesauri
Isolate a part of it (e.g. geographical names)
Compare the resulting terms
- String matching
- or more complex operations
Re-iterate on non-matching terms
Linking subjects
Linking People (zoom)
FILTERS
COMPARISON
Linking People (zoom)
VERIFY
Linking People (zoom)
REPEAT
Linking People
Result of the linking process
Term # of Links
Subjects (things) 4.167
Names (bands, companies,...) 2.197
Locations 4.011
Persons 11.265
Total 21.640
Learnings
●  Linking is not (very) technical
●  Linking requires knowledge of the sources
●  Linking requires human input
●  Richer thesaurus: more chance of finding
links
http://link.spinque.com/VIAA-1.0/
Demonstrator
Demonstrator - Sources
●  VIAA
○  Part of the VRT collection.
○  +/- 35.000 items (of 1 mio records)
○  Annotations and links to the thesaurus
●  Sound & Vision
○  ‘Openbeelden’
○  Again: a small part of the archive
○  The annotation of those items.
SLIDER
determines weight of the collection
SEARCH
Keyword “migration”
All matching keywords of all search results (grouped & with indicator)
Search results
with highlighted
-  matches in
descriptions
-  matches in
thesaurus terms
collection indicator
Changed the weight towards the North
… more results
from across the
border
Clicked video is now the search criterium
Terms with associated with all related video’s
Related video’s
User searches for related VRT content based
on the B&G video.
Conclusions
●  Up until now: we gained a lot of insight in working with
thesauri
○  SKOS
○  Linking thesauri
○  Linked and open data
○  Got a good view on what thesaurus landscape
looks like
●  Linking makes thesauri richer, allows for collaborative
work
Many more steps ahead
●  Select thesauri to start working from
○  Link them where appropriate
○  Reuse existing sources where possible (VIAF, GeoNames, …)
●  Manage
○  Good management tool and workflow for thesauri
●  Use (integration, integration, integration)
○  Use (public) LOD sources in our collection mgmt system (!)
○  Use them as reference source for term extraction
○  Re-use the terms in user-facing systems
Thanks!

Weitere ähnliche Inhalte

Ähnlich wie Session 0.0 media panel - matthias priem - gtuo - semantics 2017

Apa frascati november 2012
Apa frascati november 2012Apa frascati november 2012
Apa frascati november 2012
pkdoorn
 
Introduction to GeoNetwork and GeoCat Bridge - teknologiforum Oslo 11-2012
Introduction to GeoNetwork and GeoCat Bridge - teknologiforum Oslo 11-2012Introduction to GeoNetwork and GeoCat Bridge - teknologiforum Oslo 11-2012
Introduction to GeoNetwork and GeoCat Bridge - teknologiforum Oslo 11-2012
Jeroen Ticheler
 

Ähnlich wie Session 0.0 media panel - matthias priem - gtuo - semantics 2017 (20)

Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
 
Estermann Wikidata and Heritage Data 20170914
Estermann Wikidata and Heritage Data 20170914Estermann Wikidata and Heritage Data 20170914
Estermann Wikidata and Heritage Data 20170914
 
Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"
 
Apa frascati november 2012
Apa frascati november 2012Apa frascati november 2012
Apa frascati november 2012
 
Tracking the open access book
Tracking the open access bookTracking the open access book
Tracking the open access book
 
Introduction to GeoNetwork and GeoCat Bridge - teknologiforum Oslo 11-2012
Introduction to GeoNetwork and GeoCat Bridge - teknologiforum Oslo 11-2012Introduction to GeoNetwork and GeoCat Bridge - teknologiforum Oslo 11-2012
Introduction to GeoNetwork and GeoCat Bridge - teknologiforum Oslo 11-2012
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory Workshop
 
Peer Council 2017 OCLC Update
Peer Council 2017 OCLC UpdatePeer Council 2017 OCLC Update
Peer Council 2017 OCLC Update
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
 
Exploring Audiovisual Archives through Aligned Thesauri
Exploring Audiovisual Archives through Aligned Thesauri Exploring Audiovisual Archives through Aligned Thesauri
Exploring Audiovisual Archives through Aligned Thesauri
 
Estermann wikidata introduction-sapa-20180630
Estermann wikidata introduction-sapa-20180630Estermann wikidata introduction-sapa-20180630
Estermann wikidata introduction-sapa-20180630
 
FAST Update
FAST UpdateFAST Update
FAST Update
 
Collaborating in medical history at DCDC15
Collaborating in medical history at DCDC15Collaborating in medical history at DCDC15
Collaborating in medical history at DCDC15
 
The Europeana Strategy and Linked Open Data
The Europeana Strategy and Linked Open DataThe Europeana Strategy and Linked Open Data
The Europeana Strategy and Linked Open Data
 
DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15
 
CEPH DAY BERLIN - WELCOME
CEPH DAY BERLIN - WELCOME CEPH DAY BERLIN - WELCOME
CEPH DAY BERLIN - WELCOME
 
Knowledge exchange consensus on monitoring OA: recommendations from the Copen...
Knowledge exchange consensus on monitoring OA: recommendations from the Copen...Knowledge exchange consensus on monitoring OA: recommendations from the Copen...
Knowledge exchange consensus on monitoring OA: recommendations from the Copen...
 
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
 
Comsode pilot - Netherlands Institute for Sounds and Vision
Comsode pilot - Netherlands Institute for Sounds and VisionComsode pilot - Netherlands Institute for Sounds and Vision
Comsode pilot - Netherlands Institute for Sounds and Vision
 
A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...
 

Mehr von semanticsconference

Mehr von semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes &amp; smart grids: towards interoperability...
Session 1.3   energy, smart homes &amp; smart grids: towards interoperability...Session 1.3   energy, smart homes &amp; smart grids: towards interoperability...
Session 1.3 energy, smart homes &amp; smart grids: towards interoperability...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Session 2.3 semantics for safeguarding &amp; security – a police story
Session 2.3   semantics for safeguarding &amp; security – a police storySession 2.3   semantics for safeguarding &amp; security – a police story
Session 2.3 semantics for safeguarding &amp; security – a police story
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Session 0.0 media panel - matthias priem - gtuo - semantics 2017