Estermann Wikidata and Heritage Data 20170914

90 Aufrufe

Veröffentlicht am

Presentation at Open Cultural Data Hackathon 2017 in Lausanne on Wikidata and Heritage Data.

  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Estermann Wikidata and Heritage Data 20170914

  1. 1. Wikidata & Heritage Data Where do we stand? What’s next? Lausanne, 14 September 2017 Sijie Dai, Captain Alving – Prix de Lausanne 2010. Photo by Inisheer, CC BY-SA (Wikimedia Commons) Unless otherwise noted,, the content of this presentation is made available under the CC BY 4.0 license.
  2. 2. ▶ The aim of this project is to coordinate, facilitate and promote the ingestion of cultural heritage related data into Wikidata, to facilitate the cleansing and enhancement of this data and to promote its use across Wikipedia, its sister projects and beyond. ▶ It is our vision to establish Wikidata as a central hub for data integration, data enhancement, and data management in the heritage domain. Aim and Vision (WikiProject Cultural Heritage)
  3. 3. ▶ Establish Wikidata as a database that covers the entire world’s cultural heritage. ▶ Establish Wikidata as a central hub that interlinks GLAM collections around the world; and provides links to bibliographic, genealogic, scientifc and other collections of information; create the ultimate authority file. ▶ Foster truly multilingual and global collaboration among people from various backgrounds. ▶ Leverage synergies between institutions, reduce duplicate work. ▶ Encourage debate in the community by highlighting and interrogating differences in perspective. ▶ Provide a single source of data for some of the most popular web sites and apps, including Wikipedia infoboxes and lists. Vision (Blog posts: Stinson et al. 2016; Thornton / Cochrane 2016; Poulter 2017)
  4. 4. Thematic Projects [Example]
  5. 5. Current Challenges & Insights
  6. 6. Data Ingestion Data Ingestion Data Provision Data Provision Ontology Developmen t Ontology Developmen t Data Maintenance Data Maintenance Data UseData Use Core Aspects of the Project Community & Collaboration Community & Collaboration Platforms & Tools Platforms & Tools Wikidata Within the Wider Data Landscape Core Processes
  7. 7. ▶ Wikidata needs to be explained to institutions in view of data donations. • Lack of awareness of the importance of open licenses in databases • Fears of loss of control related to publishing data under CC-0 • What can institutions gain from their involvement in Wikidata? ▶ Community members need assistance with scraping data from websites. ▶ Present coverage is biased; it is highest for Western Europe and North America; how to get access to data from other world regions? How To Get Access to Freely Licensed Data?
  8. 8. ▶ • Personnalités Vaudoises (BCUL) • Swiss Photography Metadata (Büro für Fotografiegeschichte) • Artist data from the SIKART Lexicon on art in Switzerland (SIK-ISEA) • Metadata of the Historical Dictionary of Switzerland (HLS) • PCP Inventory (Federal Office for Civil Protection) • Inventory of Historical Monuments (Canton of Zurich) • Inventory of Historical Monuments (City of Zurich) • Inventory of classified Gardens and Parks (City of Zurich) • Art in the Urban Space (City of Zurich) • Swiss GLAM Inventory (OpenGLAM) • Inventory of Research Libraries in Switzerland (Swissbib) • ISplus Swiss (G)LAM Inventory (Swiss National Library) • Schauspielhaus Zürich Repertoire of Theatre and other Productions, 1938–1968 • Swiss Theatre Metadata (Swiss Theatre Collection) • Plazi TreatmentBank (repository of the world's species) ( • Historical Statistics of Switzerland (University of Zurich) Data Provision – Which Datasets are Useful?
  9. 9. Challenges Related to Ontology Development (1/2) All rights reserved.
  10. 10. ▶ Coping with the Bazaar: • Sometimes changes to property definitions are too easily made by volunteers • There is a rigorous process for creating new properties, but not for changing definitions of properties or creating new classes • No master language; how to keep translations of definitions in synch? • Sometimes different approaches are used to model the same thing. ▶ What are good design principles? • Re-usability of properties across various domains • Select high priority areas first, do not try to solve everything overnight for the entire cultural heritage domain • … ▶ Finding a balance between: • The expressive power of an ontology • Its practicability when it comes to large scale use by many people • Its queryability (usability from the perspective of data users) Challenges Related to Ontology Development (2/2)
  11. 11. ▶ Mapping Between Data Models • Getting an overview of appropriate properties and classes can be a time-consuming exercise. • Creating new properties requires community agreement and may involve lengthy discussions and compromises. • There is still a lot of work to be done in the area of typologies and thesauri [Example] ▶ Matching Items / Disambiguation • There are tools like Mix’n’Match and OpenRefine to support this, but it remains a major challenge, esp. with datasets which haven’t resolved this issue internally. ▶ Incorrect / Incoherent Data on Wikidata • Many data ingestion projects require cleansing up of existing data. ▶ Repeated Ingestion / Updates • How to approach the historicization of data? • How to set up processes to regularly update data? Challenges Related to Data Ingestion N.B.: We are not filling a void or starting from scratch, but contributing to an existing ecosystem of data, data models, and community members!
  12. 12. Example: Data Cleansing
  13. 13. ▶ Establishing and Documenting Data Quality • Getting rid of duplicates • Dealing with incorrect and inconsistent data • How to monitor data quality and data completeness? ▶ Building a Network of Trust • Linking all statements to a reliable source • In the future: “Signed Statements”  ▶ Data Exchange Between Wikidata and Primary Databases ▶ Data synchronization: How to keep data mutually up to date? ▶ How to make it easier for GLAM employees to follow changes/improvements to their data on Wikidata? Challenges Related to Data Maintenance
  14. 14. ▶ Chicken and Egg Problem: • Data usage drives data quality & completeness • Data quality & completeness are prerequisites of data use Challenges Related to Data Use
  15. 15. [Example]
  16. 16. ▶ Linking Wikidata with other databases • Map existing standards from the GLAM sector to Wikidata • Merge data imported from Wikipedia with data from reliable databases ▶ In what areas is Wikidata supposed to… • serve as the master database (referencing sources other than databases)? • hold data imported from reliable databases? • link to authoritative databases (without holding the actual data)? ▶ How should GLAMs organize their relationship with Wikidata? • Provide mutual links? • Ingest part or all of their data into Wikidata? • Synchronize part or all of their data with Wikidata? • Use Wikidata as their main database? Wikidata and the Wider Data Landscape
  17. 17. ▶ How to improve guidelines, community structures, reporting etc. in order to be able to involve more GLAM personnel in Wikidata? ▶ How best to foster a shared data modelling practice in various areas? (Need for more modelling show cases, coordination, etc.) ▶ Need for training and tools (to facilitate the accomplishment of certain tasks). ▶ The evolving tools landscape constitutes a challenge when establishing processes and working with guidelines. ▶ e ▶ Wikidata + GLAM Facebook Group Community & Collaboration
  18. 18. Useful Tools ▶ Example: Tools I used for the ingest of the Swiss GLAM Inventory: • Microsoft Excel / Open Office Calc • Wikidata Query Service • Open Refine • Reconcile-csv • Listeria • Quick Statements • Microsoft Word / Excel (mail merge) • Hatnote: «Listen to Wikipedia»
  19. 19. ▶ Diff tools to help tracking changes in datasets on Wikidata and to synchronize with external databases ▶ Statistics tools (data completeness; data use) ▶ Data visualization tools (beyond what the Query service can already do) ▶ Data tracking tools (data completeness; see how data evolves) ▶ Improved version of the Quick Statements Tool (see feature requests) ▶ Customizable forms for manual data entry Tools – Wishlist
  20. 20. Thank You for Your Attention! Contact Beat Estermann Bern University of Applied Sciences +41 31 848 34 38