ICT Role in 21st Century Education & its Challenges.pptx
Digital game preservation conference 12 25-2018
1. Should Wikidata / Wikibase
be part of our solution?
International Digital Game Preservation 2019 Symposium
Peter Chan, Digital Archivist
Stanford University Libraries
2. Peter Chan
● Born-digital Lab Manager
○ 8-inch floppy disk drive
● Project manager for ePADD project
○ Software that support the appraisal, processing, discovery, and delivery of email archives.
○ National Digital Stewardship Alliance (NDSA) Innovation Award 2017 (USA)
○ Digital Preservation Coalition (DPC) Software Sustainability Institute Award for Research and
Innovation 2018 (UK)
● Visiting digital archivist at the Royal Library of Copenhagen (4 weeks 2015)
○ Archiving of emails
○ Processing and delivery of born-digital materials using forensic software
○ Preservation of video games
3.
4.
5. Projects Related to Digital Preservation
● Preserving Virtual Worlds (PVW) I & II (2007-2013)
● An Inter-Institutional Model for Stewardship (AIMS) (2010-2012)
● Cabrinety-NIST (National Institute of Standards and Technology)
(2012-2015)
● Game Metadata and Citation Project (GAMECIP) (2014-2017)
● Email, Process, Appraise, Discover, Delivery (ePADD) (2014-2018)
● Scaling Emulation as a Service Infrastructure (EaaSI) (2017-2020)
6. Digital Game Preservation
● What exactly are we trying to preserve?
● How do we preserve our holdings?
● How do we let people know about our collections?
● How can people use the resources?
7. Preserve / Collect what?
● Created by creator(s) of the game
○ Software
○ Source code
○ Technical documentations
○ Production materials
○ Designer stories
○ Permission to break digital rights management for preservation copy
○ Permission to copy by anyone, etc.
8. Collect what?
● Created by third parties
○ Printed catalog
○ Strategy guides
○ Demo file (recording of a game session)
○ Records of interaction with the user community
○ Books, video, blog, articles, etc. about the games
○ Game music produced by fans or original live performances
15. How can we preserve our collections?
● Copy of software (fulfill exemption criteria to break DRM)
● Convert digital object into a media-neutral format (IMG or ISO)
● Perform source migration
● Store copies in preservation repository with metadata (different levels)
● Create emulator (Internet Archive, Ritsumeikan University, BnF, etc.)
● Provide emulation platform (Yale)
● Digitize materials in different formats (paper, tape, etc.)
● Archive born-digital materials such as websites, online videos, emails, etc.
16. Levels of Digital Preservation
are a tiered set of
recommendations by the
National Digital Stewardship
Alliance in USA for how
organizations should begin to
build or enhance their digital
preservation activities.
19. Stanford Digital Repository (SDR)
● Administrative controls enable depositors to specify content licenses, control
content release through embargo, and manage public access levels for finding,
viewing and downloading content.
● Metadata describing the content is indexed for search and discovery in
SearchWorks, and copies of ingested content are provided via persistent URLs
(PURLs) to authorized users.
● Each digital object in SDR is stored redundantly in geo-diverse locations and
audited systematically to ensure bit preservation.
20. Discover what? By whom?
● Resource (work, expression, mansfestion, item)
● Metadata need for game preservation (PVW)
● User experience on video games (GAMER)
● Relationships among video games (GAMER, Fukuda & Mihara)
● Big picture of the holdings (quantitative analysis)
● Audience
○ player, researchers, curators
○ international / domestic
○ public / internal
21. What tools are people using?
● Items (holdings)
○ Finding Aids (e.g. https://oac.cdlib.org/findaid/ark:/13030/kt529018f2/)
○ Tables in Websites (http://www.dh-jac.net/db/rcgs/search.php)
○ MARC 21 records (https://searchworks.stanford.edu/view/7644450)
○ Dublin Core records (https://lgira.mesmernet.org/items/show/648)
○ Non-standard database (https://archive.org/details/softwarelibrary_msdos_games)
○ Internal use museum system - Computer Game Museum in Berlin
○ Internal use spreadsheet, database.
● Works (information about video games)
○ Complex database (https://www.mobygames.com/game/doom__)
○ Hyperlink in html page (https://en.wikipedia.org/wiki/Doom_(1993_video_game))
22. Need better information retrieval
● Atari 2600 vs Atari VCS vs Sears Video Arcade vs. Sears Tele-Games
Video Arcade
● Nintendo DS vs 닌텐도 DS vs 任天堂DS vs ニンテンドーDS
● CD vs. Compact Disc vs. CD-ROM vs. Compact Disc Read Only Memory
● Steve Meretzky vs. S. Eric Meretzky vs. Steven Meretzky
23. Controlled Vocabularies / Authority Files
● Library of Congress Name Authority FIle
● International Standard Name Identifier (ISNI - OCLC)
● Library of Congress Subject Headings
● GAMER Group: Gameplay, Mood, Visual Style, etc.
● GAMECIP Project: Platform, Storage Media
24. What Stanford Use Now?
● Finding aids in OAC (Online Archive of California)
● Records based in MARC 21 in Stanford’s catalog systems
● Web tables in idvn.org (International Videogame Data Network)
● GAMECIP Project: Platform, Storage Media controlled vocabularies
● Library of Congress Authority Files / Subject Headings
25. How can we deliver preserved materials?
● Public or designated community
● Software in original package / other physical objects at the institution
● Play the software using original hardware at the institution
● Play the emulated or virtualized version at the institution
● Play the emulated or virtualized version in web
● EaaSI (Emulation as a Platform Infrastructure)
● Allow download of software
● Look at scanned objects in the web
● Read / watch / listen born digital materials related to the game
31. Difficult for me to get the information
● Finding Aid
○ M0997 https://oac.cdlib.org/findaid/ark:/13030/kt529018f2/entire_text/
○ M2080 https://oac.cdlib.org/findaid/ark:/13030/c8mg7vkj/entire_text/
● Excel file
● Web Table
○ Table in Website https://ivdn.org/stanford-university/
● Catalog Records
○ Many games in one record https://searchworks.stanford.edu/view/4534811
○ One game one record https://searchworks.stanford.edu/view/7644920
● Stanford Digital Repository
○ https://searchworks.stanford.edu/view/kf110fv4329
32. Alternatives to solve the problem
● Catalog each item in finding aids
○ Resource not prioritized to catalog 20,000 items
● Create an internal spreadsheet for items in finding aids
○ Messy solution
● Create a database to hold information on items
○ Possible solution
34. Need for Controlled Vocabularies
● Atari 2600 vs Atari VCS vs Sears Video Arcade vs. Sears Tele-Games
Video Arcade
● Nintendo DS vs 닌텐도 DS vs 任天堂DS vs ニンテンドーDS
● CD vs. Compact Disc vs. CD-ROM vs. Compact Disc Read Only Memory
● Steve Meretzky vs. S. Eric Meretzky vs. Steven Meretzky
35. Publishing Controlled Vocabulary as LOD
● http://metadataregistry.org/concept/show/id/7018.html
○ Persistent url
○ Multilingual
○ Recommended standard by W3C for controlled vocabulary
○ Editing controlled by creator
● https://gamemetadata.soe.ucsc.edu/platform/1140
○ Show images
○ Editing controlled by creator
● https://gamemetadata.soe.ucsc.edu/platform
○ Apple Mac OS X v10.11, v 10.12, v10.13, v10.14 missing
○ Fairchild Channel F. missing
36. Issues on publishing controlled vocabulary
● Open Metadata Registry - supported by a consulting firm
● Open Metadata Registry - don’t support image in the schema
● Redirect to another site to show images is messy
● Require special technical resource to host web-based, collaborative
development platform for managing SKOS such as VocBench 3
● Long term sustainability of open source project such as VocBench 3,
OntoWiki is not sure
● National library is slow or don’t respond to our request
37. Not just these problems, there are others
● Need preservation metadata in catalog record (PVW)
● Need metadata on relationship among video games (GAMER, Fukuda & Mihara)
● Need information on user experience in catalog record (GAMER)
● Need multilingual interface for international audience
● Need unique web identifier so that other people can link to the records
● Need easy way for collaboration
● Need easy way to search, browse, and perform quantitative analysis across
projects
38. Need to find solution to
address multiple issues in
discovery
39. Factors to Consider in Discovery
● Structured vs. Unstructured data
● Centralised vs. Decentralised
● Persistent web identifier vs. Local Id
● Allow public to edit vs. Restrict designated group to edit
● Changes tracked vs. Changes not tracked
● Single model vs. Multi-models (cataloging)
● Linked data format vs. Other formats
● Platform more certain future vs. Less certain future
● Multilingual vs. Single language
40. What is Wikidata?
● Wikidata, launched in 2012, is a collaboratively edited knowledge base
hosted by the Wikimedia Foundation.
● It is intended to provide a common source of open data which can be
used by Wikimedia projects such as Wikipedia, and by anyone else,
under a public domain license.
● Factual claims are stored as statements
○ Subject - predicate - object
○ Item - property - value (e.g. DOOM, video game)
41. Wikimedia Foundation
● The nonprofit that hosts Wikipedia, Wikidata, and others
● Created MediaWiki which is used in Wikipedia and Wikibase which is
used in Wikidata
● Donations and contributions US$98 million for 07/2017 - 06/2018
● 300 staff and contractors
● 200,000 volunteer editors
42. Wikipedia
● A free encyclopedia, written collaboratively by the people who use it.
Anyone can edit almost every page.
● 5th most popular websites in the world (as of May 16, 2018)
● More than 5.7 million English, 2.2 million German and 1.1 million
Japanese articles (Dec. 14, 2018)
● Encyclopædia Britannica - 120,000 articles
● “One posts their misinformation, someone corrects it and the first
author posts his points right back.”
45. Wikidata
● 53 million items (04/30/2018)
● Page views by country in 2017: 8.08M Germany; 5M USA; 4.1M Russia
● 2017: a Wikidata turning point. Wikidata used by
○ Google Knowledge Graph
○ Digital assistants: Siri, Alexa
○ Infoboxes on Wikipedia
46.
47. The rise of Wikidata as a linked data source
http://hangingtogether.org/?p=6775
48. Wikidata examples
● Nintendo DS https://www.wikidata.org/wiki/Q170323
● Doom https://www.wikidata.org/wiki/Q189784
49. Quantitative analysis - SPARQL endpoint
● Create your own query
● Modify example (Number of films by year and genre) to show video
game (Q7889) information
● Change from Scatter chart to Table
50. Wikidata - Users
● National Library of Wales
○ https://blog.wikimedia.org/2016/11/05/wikidata-visiting-scholar-art-dataset/
● The Smithsonian
○ https://confluence.si.edu/display/LODPP/Smithsonian+Open+Data+Pilot
● Europeana
○ https://pro.europeana.eu/page/get-your-vocabularies-in-wikidata
● Yale / BnF / Open Preservation Foundation
○ (https://ipres2017.jp/wp-content/uploads/7.pdf)
51. Issues in Wikidata
● Data model - properties decided by Wikidata
● Ensure properties listed in Wikidata behave according to
your expectation - e.g. broad match (Q39894595)
● Data can be edited by anyone
● All data publish as public domain CC0 (public domain)
52. What is Wikibase
● Wikibase is the software that enables MediaWiki to store structured
data or access data that is stored in a structured data repository.
53. Wikibase
● Address the following issues
○ Control on who can edit information
○ Implement data model best fit for your need (your own
interpretation of work, expression, manifestation, etc.)
○ Contribute to LOD - Persistent URL
○ Quantitative analysis - SPARQL endpoint
54. Wikibase Issues
● If institutions are not using the same Wikibase, how can they
synchronize among different incidences of Wikibase hosted by
different institutions?
● Resource to host Wikibase instance
● Understand the properties listed in Wikibase
● Know how to install, maintain the software
55. Wikibase - Users
● OCLC (controlled vocabularies)
○ https://www.oclc.org/research/themes/data-science/linkeddata/linked-data-prototype.html
● Rhizome (modeling for preservation of digital art)
○ https://wikimediafoundation.org/2018/09/06/rhizome-wikibase/
● German National Library (controlled vocabularies)
○ https://wiki.dnb.de/display/GND/Authority+Control+meets+Wikibase
56. Rhizome - early Wikibase user
● In the digital arts field, we deal with pretty specialized performance information
that the world at large is probably not interested in, or the community hasn’t
come to an agreement how to describe it.
● Licensing restrictions of Wikidata and Commons prevent certain information to
be stored there: for instance, reference information about software would in
many cases be contained in screenshots, which for Rhizome’s purposes is not
permitted on Wikidata and Commons.
57. Federated Wikibase Instances
● In digital art, artists have sometimes deliberately strayed away from
standards, or have exploited very specific versions of software and file
formats. Here we see a large need for federation [Ed. note: meaning
individual but interconnected databases]: many different Wikibases,
used by individual organizations, containing specialized data, while all
pointing to the same Wikidata items, describing these items from the
perspective of their own specialization.
58. One Possible Scenario
● Institutions collect video games post their holdings in Wikidata
● Institutions describe video games according to their models with
reference to video game items in Wikidata
● Institutions add cross references to things which are equivalent
● Users perform federated query to get the information they need
59. Reference
● McDonough, J. P., Olendorf, R., Kirschenbaum, M., Kraus, K., Reside, D., Donahue, R., & Rojo, S., 2010.
Preserving Virtual Worlds Final Report.
● Lee, J. H. et al., 2014. Relationships among video games: Existing standards and new definitions.
Proceedings of the ASIST Annual Meeting.
● Lee, J. H., Clarke, R. I. & Perti, A., 2015. Empirical evaluation of metadata for video games and
interactive media. Journal of the Association for Information Science and Technology.
● Fukuda, K. & Tetsuya, M. 2018. A Development of the Metadata Model for Video Game Cataloging:
For the Implementation of Media-Arts Database. IFLA WLIC 2018.
● de Groat, G., 2015. A History of Video Game Cataloging in U.S. Libraries. Cataloging & Classification
Quarterly.
● Kaltman, E. et al., 2016. Implementing Controlled Vocabularies for Computer Game Platforms and
Media Formats in SKOS. Journal of Library Metadata.
60. Reference
● https://en.wikipedia.org/wiki/List_of_most_popular_websites
● https://meta.wikimedia.org/wiki/List_of_Wikipedias
● https://wikimediafoundation.org/2018/09/06/rhizome-wikibase/
● https://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
● https://stats.wikimedia.org/v2/#/wikidata.org
● https://www.wikidata.org/wiki/Wikidata:List_of_properties
● http://www.softwarepreservationnetwork.org/1201-exemption-guide-for-software-preservationists
/
● https://www.arl.org/storage/documents/publications/2018.09.24_softwarepreservationcode.pdf
Video
● Wikidata: State of the Project - Lydia Pintscher at WikidataCon 2017
● MCN 2018 - Art Wiki at SFMOMA
● GLAM WIKI 2018