SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Wikidata as a linking hub for knowledge
organization systems?
Integrating an authority mapping into Wikidata and
learning lessons for KOS mapping
Joachim Neubert
ZBW – Leibniz Information Centre for Economics, Kiel/Hamburg
NKOS Workshop @ TPDL, Thessaloniki, Greece
21.9.2017
The idea of linking hubs
Page 2
Image by Jakob Voß (ELAG 2017)
Agenda
1. Suitability of Wikidata as linking hub
2. Experiences from moving an authority mapping to Wikidata
3. Outlook to knowledge organization systems
Page 3
Wikidata basics
 Knowledge base for Wikimedia projects
 All kinds of entities: concepts, places, people, works …
 Editable by everyone
 Data available (under CC0)
 http://query.wikidata.org/ (SPARQL)
 JSON API & database dumps
Page 4
Wikidata statements
Page 5
Page 6
Linking mechanism: external identifiers
 Property value: unique IDs from external database
 + URL stub in the property definition („formatter URL“)
 ~2,000 external identifier properties
 Examples:
 VIAF
 proteins
 African plants
 Swedish cultural heritage objects
 TED conference speakers
Page 7
Integrating GND – RePEc author mapping into
Wikidata
In the EconBiz economics search portal, authors are identified
differently:
 by GND ID in data from ZBW’s Econis catalog
 by RePEc Author ID in data from Research Papers for Economics
Large volumes: 450,000 vs. 50,000 distinct persons
~3,000 pairs of IDs discovered in a previous project
Page 8
Person identifiers at Wikidata and EconBiz
Page 9
unknown overlap at project start
Linking via Wikidata
Wikidata-Properties for both identifier systems
 GND ID (P227): ~375,000 items (humans only)
 RePEc Short-ID (RAS ID, P2428): ~2,200 items
Since every identifier should identify exactly one person, we can derive
 GND ID ⟶ Wikidata item ⟶ RAS ID
 RAS ID ⟶ Wikidata item ⟶ GND ID
where both properties have values (~760 items, as of 2017-04-25)
Page 10
Supplement WD items with missing IDs values
from mapping
Federated SPARQL queries revealed:
 384 WD items identified by RAS ID missing GND ID
 77 WD items identified by GND ID missing RAS ID
Adding missing IDs in bulk
 Transform to QuickStatements2 input file (SPARQL query, script)
 Copy & paste to QuickStatements2
Page 11
Bulk editing with QuickStatements2
Page 12
Further simplification with upcoming release of wdmapper command line tool
Add identifiers for „most important“ authors from
GND and RAS
 4,600 Top 10% economists scraped from RePEc ranking page
 18,000 GND authors with more than 30 publications in EconBiz
 Transform and load into Wikidata‘s Mix‘n‘match (RePEc Top, GND
economists (de))
 CSV file with ID, name, description
 Confirm match candidates
 Repeat adding missing identifiers from the mapping
Page 13
Checking proposed matches in Mix‘n‘match
Seite 14
Add missing Wikidata items from mapping
 Verify missing authors indeed are not in Wikidata
 Generate Wikidata item data from existing mapping in
QuickStatements2 input format (SPARQL query, script)
 2179 new Wikidata items created, synthesized with values:
 name from GND
 occupation “economist” from occurrence in RePEc Top 10%
 gender and date of birth/death (if available) from GND
 description from GND’s info field and RePEc’s “works for”
Page 15
Recommendations for item creation
 Pay attention to Wikidata’s notability criteria
 Explain your plan and ask for feedback in the Wikidata project chat
 Apply for a bot account to make mass edits (example)
 Source every statement (hints)
 Use QuickStatements2 (more convenient input format, batch mode,
sources)
Page 16
Result: the mapping in Wikidata
4087 Wikidata items with both GND and RAS IDs
 3081 matches from ZBW’s mapping
 1006 matches contributed by Wikidata users
(as of 2017-06-04)
Page 17
RePEc “Top economists”: 60 % coverage
Page 18
Additional immediate benefits
 Links to multilingual human-readable Wikipedia pages about authors
(~1470 English, ~680 German, ~270 Spanish, etc.)
 Additional data from Wikidata
 Mappings to other authorities for free (e.g., ~1,600 RAS ⟷ VIAF)
Page 19
Strategic benefits
 Outsourced interface, storage and operation
 Crowdsourced mapping maintenance
 Wikidata has policies and tools for data quality
 Identifiers and items inserted by individual contributors or systematic
efforts add up continuously and are available as Open Data
Page 20
Mapping knowledge organization systems
External identifier properties for thesauri and classifications exist, e.g.
 GND subject headings
 Art & Architecture Thesaurus (Getty)
 UNESCO Thesaurus
 DDC classes
Upcoming: Mapping of „STW Thesaurus for Economics“ to Wikidata –
started with STW sub-thesaurus „Geographical Names“
Page 21
Beyond sameness – mapping relations
In rare cases (for locations), different mapping relations are required:
 broad or narrow matches – e.g.,
„Lake Constance“ (Wikidata) < „Lake Constance region“ (STW)
 close matches – e.g.,
„Overseas territories“ (Wikidata) ≅ „Overseas territories“ (STW)
(„territories which have a special relationship with one of the member
states of the EU“ in Wikidata, not defined in STW)
Property proposal under discussion (SKOS mapping relations as
optional qualifiers for external id properties)
Page 22
Workflow considerations
 Existing concordances (e.g., STW ./. GND) can be exploited for the
creation of „mapping candidates“, see query based on
STW descriptor <–(skos:exactMatch)–> GND subject heading <–(wdt:P227)–> Wikidata item
• 2,034 candiates for STW‘s 5,339 non-geographical descriptors
• Evaluation of 50 randomly selected entries revealed only 2 false, 7
more not exactly matching
• Therefore, automatic creation of these STW properties and
intellectual check afterwards is an option
• Remainder can be handled by Mix‘n‘match
Page 23
Conclusions
Page 24
 Use of Wikidata as a linking hub for KOS looks very promising
 Different from existing “universal” KOS (think DDC), Wikidata is
easily extensible
 Tools for matching as well as for consistency checks are in place
 Crowd contribution enabled and embraced
Page 25
Thanks for listening!
Joachim Neubert
ZBW – Leibniz Information Centre for Economics
j.neubert@zbw.eu
http://zbw.eu/labs
https://github.com/zbw/repec-ras
https://github.com/zbw/sparql-queries/tree/master/wikidata

Weitere ähnliche Inhalte

Was ist angesagt?

Iaos From Data Access To Data Integration
Iaos From Data Access To Data IntegrationIaos From Data Access To Data Integration
Iaos From Data Access To Data Integration
annegrete
 
Archiving Of Electronic Publishing
Archiving Of Electronic PublishingArchiving Of Electronic Publishing
Archiving Of Electronic Publishing
annegrete
 

Was ist angesagt? (20)

Iaos From Data Access To Data Integration
Iaos From Data Access To Data IntegrationIaos From Data Access To Data Integration
Iaos From Data Access To Data Integration
 
Big Data Projects
Big Data ProjectsBig Data Projects
Big Data Projects
 
IFLA Semantic Web at the BCN, 15.08.2012
IFLA Semantic Web at the BCN, 15.08.2012IFLA Semantic Web at the BCN, 15.08.2012
IFLA Semantic Web at the BCN, 15.08.2012
 
BDE SC6-pilot - 05/12/16 - cologne Michalis Vafopoulos
BDE SC6-pilot - 05/12/16 - cologne Michalis VafopoulosBDE SC6-pilot - 05/12/16 - cologne Michalis Vafopoulos
BDE SC6-pilot - 05/12/16 - cologne Michalis Vafopoulos
 
The GetLOD Story
The GetLOD StoryThe GetLOD Story
The GetLOD Story
 
Linking the 20th century paper history to the sum of all knowledge
Linking the 20th century paper history to the sum of all knowledgeLinking the 20th century paper history to the sum of all knowledge
Linking the 20th century paper history to the sum of all knowledge
 
Big data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilotsBig data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilots
 
Wikidata as opportunity for special collections: the 20th Century Press Archi...
Wikidata as opportunity for special collections: the 20th Century Press Archi...Wikidata as opportunity for special collections: the 20th Century Press Archi...
Wikidata as opportunity for special collections: the 20th Century Press Archi...
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
 
Publishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataPublishing XBRL as Linked Open Data
Publishing XBRL as Linked Open Data
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
 
Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4
 
2016 SDMX Experts meeting, SDMX Design & Modelling, Daniel Suranyi
2016 SDMX Experts meeting, SDMX Design & Modelling, Daniel Suranyi2016 SDMX Experts meeting, SDMX Design & Modelling, Daniel Suranyi
2016 SDMX Experts meeting, SDMX Design & Modelling, Daniel Suranyi
 
Big data value policy context and public private partnership
Big data value policy context and public private partnershipBig data value policy context and public private partnership
Big data value policy context and public private partnership
 
SC1 Workshop 2 Technical overview
SC1 Workshop 2 Technical overviewSC1 Workshop 2 Technical overview
SC1 Workshop 2 Technical overview
 
BDE SC6.2 Workshop-05/12/16 - CESSDA
BDE SC6.2 Workshop-05/12/16 - CESSDABDE SC6.2 Workshop-05/12/16 - CESSDA
BDE SC6.2 Workshop-05/12/16 - CESSDA
 
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
 
Semantic MediaWiki and Open Data
Semantic MediaWiki and Open DataSemantic MediaWiki and Open Data
Semantic MediaWiki and Open Data
 
Managing and Consuming Completeness Information for Wikidata Using COOL-WD
Managing and Consuming Completeness Information for Wikidata Using COOL-WDManaging and Consuming Completeness Information for Wikidata Using COOL-WD
Managing and Consuming Completeness Information for Wikidata Using COOL-WD
 
Archiving Of Electronic Publishing
Archiving Of Electronic PublishingArchiving Of Electronic Publishing
Archiving Of Electronic Publishing
 

Ähnlich wie Wikidata as a linking hub for knowledge organization systems? Integrating an authority mapping into Wikidata and learning lessons for KOS mapping

Semantic MediaWiki as Knowledge Graph Interface
Semantic MediaWiki as Knowledge Graph InterfaceSemantic MediaWiki as Knowledge Graph Interface
Semantic MediaWiki as Knowledge Graph Interface
Bernhard Krabina
 

Ähnlich wie Wikidata as a linking hub for knowledge organization systems? Integrating an authority mapping into Wikidata and learning lessons for KOS mapping (20)

Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
 
Wikidata and Semantic MediaWiki
Wikidata and Semantic MediaWiki Wikidata and Semantic MediaWiki
Wikidata and Semantic MediaWiki
 
Wikidata as a hub for the linked data cloud
Wikidata as a hub for the linked data cloudWikidata as a hub for the linked data cloud
Wikidata as a hub for the linked data cloud
 
SMW Introduction Wikimedia Hackathon Krabina 2024.pdf
SMW Introduction Wikimedia Hackathon Krabina 2024.pdfSMW Introduction Wikimedia Hackathon Krabina 2024.pdf
SMW Introduction Wikimedia Hackathon Krabina 2024.pdf
 
Linking authorities through Wikidata
Linking authorities through WikidataLinking authorities through Wikidata
Linking authorities through Wikidata
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
MWCon KnowledgeGraph Extension Krabina 2024.pdf
MWCon KnowledgeGraph Extension Krabina 2024.pdfMWCon KnowledgeGraph Extension Krabina 2024.pdf
MWCon KnowledgeGraph Extension Krabina 2024.pdf
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
 
Semantic MediaWiki as Knowledge Graph Interface
Semantic MediaWiki as Knowledge Graph InterfaceSemantic MediaWiki as Knowledge Graph Interface
Semantic MediaWiki as Knowledge Graph Interface
 
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Validating and Describing Linked Data Portals using RDF Shape Expressions
Validating and Describing Linked Data Portals using RDF Shape ExpressionsValidating and Describing Linked Data Portals using RDF Shape Expressions
Validating and Describing Linked Data Portals using RDF Shape Expressions
 
BigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE PlatformBigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE Platform
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index Structures
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
Managing and Consuming Completeness Information for Wikidata Using COOL-WD
Managing and Consuming Completeness Information for Wikidata Using COOL-WDManaging and Consuming Completeness Information for Wikidata Using COOL-WD
Managing and Consuming Completeness Information for Wikidata Using COOL-WD
 
Linked Open Data with Semantic MediaWiki
Linked Open Data with Semantic MediaWikiLinked Open Data with Semantic MediaWiki
Linked Open Data with Semantic MediaWiki
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsWikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization Systems
 

Mehr von Joachim Neubert

Wikidata as authority linking hub
Wikidata as authority linking hubWikidata as authority linking hub
Wikidata as authority linking hub
Joachim Neubert
 
Exploiting the version history of SKOS files: skos-history (SWIB13 Lightning ...
Exploiting the version history of SKOS files: skos-history (SWIB13 Lightning ...Exploiting the version history of SKOS files: skos-history (SWIB13 Lightning ...
Exploiting the version history of SKOS files: skos-history (SWIB13 Lightning ...
Joachim Neubert
 
Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)
Joachim Neubert
 
Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)
Joachim Neubert
 

Mehr von Joachim Neubert (20)

Exploring and mapping the category system of the world‘s largest public press...
Exploring and mapping the category system of the world‘s largest public press...Exploring and mapping the category system of the world‘s largest public press...
Exploring and mapping the category system of the world‘s largest public press...
 
Donating data to Wikidata: First experiences from the „20th Century Press Arc...
Donating data to Wikidata: First experiences from the „20th Century Press Arc...Donating data to Wikidata: First experiences from the „20th Century Press Arc...
Donating data to Wikidata: First experiences from the „20th Century Press Arc...
 
Wikidata (für Archive)
Wikidata (für Archive)Wikidata (für Archive)
Wikidata (für Archive)
 
20th Century Press Archives goes Wikidata
20th Century Press Archives goes Wikidata20th Century Press Archives goes Wikidata
20th Century Press Archives goes Wikidata
 
Chancen und Herausforderungen einer komplementären Nutzung von GND und Wikidata
Chancen und Herausforderungen einer komplementären Nutzung von GND und WikidataChancen und Herausforderungen einer komplementären Nutzung von GND und Wikidata
Chancen und Herausforderungen einer komplementären Nutzung von GND und Wikidata
 
Pressemappe 20. Jahrhundert: Personen- und Firmendossiers
Pressemappe 20. Jahrhundert: Personen- und FirmendossiersPressemappe 20. Jahrhundert: Personen- und Firmendossiers
Pressemappe 20. Jahrhundert: Personen- und Firmendossiers
 
20th Century Press Archives goes Wikidata
20th Century Press Archives goes Wikidata20th Century Press Archives goes Wikidata
20th Century Press Archives goes Wikidata
 
Making Wikidata fit as a Linking Hub for Knowledge Organization Systems
Making Wikidata fit as a Linking Hub for Knowledge Organization SystemsMaking Wikidata fit as a Linking Hub for Knowledge Organization Systems
Making Wikidata fit as a Linking Hub for Knowledge Organization Systems
 
Wikidata as authority linking hub
Wikidata as authority linking hubWikidata as authority linking hub
Wikidata as authority linking hub
 
EconBiz Research Dataset (SWIB16 Lightning Talk)
EconBiz Research Dataset (SWIB16 Lightning Talk)EconBiz Research Dataset (SWIB16 Lightning Talk)
EconBiz Research Dataset (SWIB16 Lightning Talk)
 
Using Wikidata as an Authority for the SowiDataNet Research Data Repository
Using Wikidata as an Authority for the SowiDataNet Research Data RepositoryUsing Wikidata as an Authority for the SowiDataNet Research Data Repository
Using Wikidata as an Authority for the SowiDataNet Research Data Repository
 
Change Tracking in Knowledge Organization Systems with skos-history
Change Tracking in Knowledge Organization Systems with skos-historyChange Tracking in Knowledge Organization Systems with skos-history
Change Tracking in Knowledge Organization Systems with skos-history
 
Anforderungen an Thesauri im Semantic Web
Anforderungen an Thesauri im Semantic WebAnforderungen an Thesauri im Semantic Web
Anforderungen an Thesauri im Semantic Web
 
Leveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
Leveraging SKOS to trace the overhaul of the STW Thesaurus for EconomicsLeveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
Leveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
 
skos-history: Tracking the evolution of Knowledge Organization Systems
skos-history: Tracking the evolution of Knowledge Organization Systemsskos-history: Tracking the evolution of Knowledge Organization Systems
skos-history: Tracking the evolution of Knowledge Organization Systems
 
KOS evolution in Linked Data
KOS evolution in Linked DataKOS evolution in Linked Data
KOS evolution in Linked Data
 
Exploiting the version history of SKOS files: skos-history (SWIB13 Lightning ...
Exploiting the version history of SKOS files: skos-history (SWIB13 Lightning ...Exploiting the version history of SKOS files: skos-history (SWIB13 Lightning ...
Exploiting the version history of SKOS files: skos-history (SWIB13 Lightning ...
 
Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)
 
Constantly Under Construction: STW Thesaurus for Economics Linked Data Maint...
Constantly Under Construction: STW Thesaurus for Economics Linked Data Maint...Constantly Under Construction: STW Thesaurus for Economics Linked Data Maint...
Constantly Under Construction: STW Thesaurus for Economics Linked Data Maint...
 
Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)
 

Kürzlich hochgeladen

哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
ydyuyu
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
F
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
F
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Monica Sydney
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 

Kürzlich hochgeladen (20)

哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girls
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 

Wikidata as a linking hub for knowledge organization systems? Integrating an authority mapping into Wikidata and learning lessons for KOS mapping

  • 1. Wikidata as a linking hub for knowledge organization systems? Integrating an authority mapping into Wikidata and learning lessons for KOS mapping Joachim Neubert ZBW – Leibniz Information Centre for Economics, Kiel/Hamburg NKOS Workshop @ TPDL, Thessaloniki, Greece 21.9.2017
  • 2. The idea of linking hubs Page 2 Image by Jakob Voß (ELAG 2017)
  • 3. Agenda 1. Suitability of Wikidata as linking hub 2. Experiences from moving an authority mapping to Wikidata 3. Outlook to knowledge organization systems Page 3
  • 4. Wikidata basics  Knowledge base for Wikimedia projects  All kinds of entities: concepts, places, people, works …  Editable by everyone  Data available (under CC0)  http://query.wikidata.org/ (SPARQL)  JSON API & database dumps Page 4
  • 7. Linking mechanism: external identifiers  Property value: unique IDs from external database  + URL stub in the property definition („formatter URL“)  ~2,000 external identifier properties  Examples:  VIAF  proteins  African plants  Swedish cultural heritage objects  TED conference speakers Page 7
  • 8. Integrating GND – RePEc author mapping into Wikidata In the EconBiz economics search portal, authors are identified differently:  by GND ID in data from ZBW’s Econis catalog  by RePEc Author ID in data from Research Papers for Economics Large volumes: 450,000 vs. 50,000 distinct persons ~3,000 pairs of IDs discovered in a previous project Page 8
  • 9. Person identifiers at Wikidata and EconBiz Page 9 unknown overlap at project start
  • 10. Linking via Wikidata Wikidata-Properties for both identifier systems  GND ID (P227): ~375,000 items (humans only)  RePEc Short-ID (RAS ID, P2428): ~2,200 items Since every identifier should identify exactly one person, we can derive  GND ID ⟶ Wikidata item ⟶ RAS ID  RAS ID ⟶ Wikidata item ⟶ GND ID where both properties have values (~760 items, as of 2017-04-25) Page 10
  • 11. Supplement WD items with missing IDs values from mapping Federated SPARQL queries revealed:  384 WD items identified by RAS ID missing GND ID  77 WD items identified by GND ID missing RAS ID Adding missing IDs in bulk  Transform to QuickStatements2 input file (SPARQL query, script)  Copy & paste to QuickStatements2 Page 11
  • 12. Bulk editing with QuickStatements2 Page 12 Further simplification with upcoming release of wdmapper command line tool
  • 13. Add identifiers for „most important“ authors from GND and RAS  4,600 Top 10% economists scraped from RePEc ranking page  18,000 GND authors with more than 30 publications in EconBiz  Transform and load into Wikidata‘s Mix‘n‘match (RePEc Top, GND economists (de))  CSV file with ID, name, description  Confirm match candidates  Repeat adding missing identifiers from the mapping Page 13
  • 14. Checking proposed matches in Mix‘n‘match Seite 14
  • 15. Add missing Wikidata items from mapping  Verify missing authors indeed are not in Wikidata  Generate Wikidata item data from existing mapping in QuickStatements2 input format (SPARQL query, script)  2179 new Wikidata items created, synthesized with values:  name from GND  occupation “economist” from occurrence in RePEc Top 10%  gender and date of birth/death (if available) from GND  description from GND’s info field and RePEc’s “works for” Page 15
  • 16. Recommendations for item creation  Pay attention to Wikidata’s notability criteria  Explain your plan and ask for feedback in the Wikidata project chat  Apply for a bot account to make mass edits (example)  Source every statement (hints)  Use QuickStatements2 (more convenient input format, batch mode, sources) Page 16
  • 17. Result: the mapping in Wikidata 4087 Wikidata items with both GND and RAS IDs  3081 matches from ZBW’s mapping  1006 matches contributed by Wikidata users (as of 2017-06-04) Page 17
  • 18. RePEc “Top economists”: 60 % coverage Page 18
  • 19. Additional immediate benefits  Links to multilingual human-readable Wikipedia pages about authors (~1470 English, ~680 German, ~270 Spanish, etc.)  Additional data from Wikidata  Mappings to other authorities for free (e.g., ~1,600 RAS ⟷ VIAF) Page 19
  • 20. Strategic benefits  Outsourced interface, storage and operation  Crowdsourced mapping maintenance  Wikidata has policies and tools for data quality  Identifiers and items inserted by individual contributors or systematic efforts add up continuously and are available as Open Data Page 20
  • 21. Mapping knowledge organization systems External identifier properties for thesauri and classifications exist, e.g.  GND subject headings  Art & Architecture Thesaurus (Getty)  UNESCO Thesaurus  DDC classes Upcoming: Mapping of „STW Thesaurus for Economics“ to Wikidata – started with STW sub-thesaurus „Geographical Names“ Page 21
  • 22. Beyond sameness – mapping relations In rare cases (for locations), different mapping relations are required:  broad or narrow matches – e.g., „Lake Constance“ (Wikidata) < „Lake Constance region“ (STW)  close matches – e.g., „Overseas territories“ (Wikidata) ≅ „Overseas territories“ (STW) („territories which have a special relationship with one of the member states of the EU“ in Wikidata, not defined in STW) Property proposal under discussion (SKOS mapping relations as optional qualifiers for external id properties) Page 22
  • 23. Workflow considerations  Existing concordances (e.g., STW ./. GND) can be exploited for the creation of „mapping candidates“, see query based on STW descriptor <–(skos:exactMatch)–> GND subject heading <–(wdt:P227)–> Wikidata item • 2,034 candiates for STW‘s 5,339 non-geographical descriptors • Evaluation of 50 randomly selected entries revealed only 2 false, 7 more not exactly matching • Therefore, automatic creation of these STW properties and intellectual check afterwards is an option • Remainder can be handled by Mix‘n‘match Page 23
  • 24. Conclusions Page 24  Use of Wikidata as a linking hub for KOS looks very promising  Different from existing “universal” KOS (think DDC), Wikidata is easily extensible  Tools for matching as well as for consistency checks are in place  Crowd contribution enabled and embraced
  • 25. Page 25 Thanks for listening! Joachim Neubert ZBW – Leibniz Information Centre for Economics j.neubert@zbw.eu http://zbw.eu/labs https://github.com/zbw/repec-ras https://github.com/zbw/sparql-queries/tree/master/wikidata