The Meertens Institute, part of the Royal Netherlands Academy of Arts and Sciences, is also a memory institution, where records are digitally preserved and curated. This talk will give an overview of the different types of records currently digitally curated at the Meertens Institute. We highlight our recent projects, such as the Sailing Letters project, where we use crowd sourcing to transcribe centuries-old handwritten letters, or the Radical Political Representation project, where we crowd source the analysis of political cartoons. These are all exemplary Digital Humanities cases, and we show our approach to the digital archiving of these materials, from creation to (re-)use.
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
Digital Archiving at the Meertens Institute
1. Digital Archiving
at the Meertens Institute
Martine de Bruin, Jan Pieter Kunst, Maarten van
der Peet, Marc Kemps-Snijders, Douwe Zeldenrust,
Rob Zeeman, Junte Zhang
Meertens Institute, Royal Netherlands Academy
of Arts and Sciences
2. The Meertens Institute
• An institute of the Royal Netherlands
Academy of Arts and Sciences (KNAW)
• Studies Dutch language and culture
– Variation in language
– Ethnography
• Increasingly more active in the e-
Humanities, providing the infrastructure.
3. Technical Development
Department of the Meertens
Institute
• Lead by Marc Kemps-Snijders
• Web developers
• User interface expert
• Java developers
• Technical application manager
• …and someone who does search engines
4. About me
• Name is Junte Zhang
• Information scientist.
• Specialized in digital
libraries and IR
• PhD on thesis “System
Evaluation of Archival
Description and Access”
• …and someone who works
on search engines at the
Meertens Institute.
5. Talk is about…
• Overview of different approaches used for
digital archiving of records at the
Meertens Institute
– Interesting (finished) projects!
– Our vision on digital archiving
• Highlighting access and (re-)use.
8. Archival Information Systems (2)
• Archon or ICA-Atom or Adlib or Fedora or
Dspace or Alfresco or ….?
• Traditional archives, audio-visual
archives, personal archives...?
– At the Meertens we have them all
10. Data (1)
• Audio-visual records
– Audio archive of the Meertens:
• https://www.meertens.knaw.nl/audio/ (only accessible at
Meertens)
– Photo archives on singers of Dutch songs or Pilgrimages
• http://www.meertens.knaw.nl/bedevaart/
• Textual records
– The Dutch Songdatabase is a website with descriptions of more
than 150.000 dutch songs.
– Transcriptions of Spoken Dutch consisting (~1.4 million
sentences).
• Many more unique records!
11.
12.
13. Digital Archiving Actions
• Difference between digital archiving of
analogue records and digital born records
– Designing Data (*)
– Creating Data
– Deciding What Data to Keep (*)
– Ingesting Data
– Storing Data
– Using and Reusing Data
15. Designing Data
• Done by researchers adhoc (can be
research itself)
• Using existing markup standards
• Considerable use of Filemaker
• Custom editors to provide tailored support
16.
17. Creating Data
• Data at the Meertens Institute is research
data and stored memory
• Paper materials (analogue archives) to
digital records ( = digitization)
– Conversion
– Surrogates
• Born digital records (digital archives)
– Mostly analogue archives at the Meertens
Institute but with digital archiving
39. Example: Radical Political Representation
• A joint project of the NIOD institute (historical
research) and Meertens Institute (technology)
• Develop a framework to describe cartoons
systematically
Enhance understanding of crowdsourcing
• Gain insight into war-time propaganda and the
development of political culture using political
cartoons
Make political cartoons accessible and explorable
• URL: http://www.meertens.nl/vova/
40. Appraise and Select
• Coordinator research collections and
management (gatekeeper, editor?)
• Pragmatic appraisal: acquisition follows
the research needs
42. Store Data
• Managed by IT staff at the
Computerization and Automation
(Informatisering & Automatisering)
department of the KNAW
• Intention: long term archiving to
specialized archival repositories like DANS
and the TLA
43. Using and Reusing Data (1)
• Authorization and authentication
– Surfnet
• Different roads to Rome
– Different technologies
• Individual search engines of projects
– MIMORE: http://www.meertens.knaw.nl/mimore/
– NLB: http://www.liederenbank.nl/
– …
• Transform data for Unified search engine:
– CMDI MI Search Engine
44.
45.
46. Transforming and Reusing Data (2)
• Using and building CLARIN (Common Language Resources and Technology
Infrastructure) in Netherlands
• Diverse collections diversely described
– (Transformed to) Federated metadata
– Different views
• Metadata is the pivot in CLARIN
– Each resource always has metadata (context)
• Semantic, serendipity and focused access
47.
48.
49. Using and Reusing Data (3)
• Connecting to content search engines, not
only access to metadata, whenever possible
• Integration of more collections of Meertens,
Netherlands, and (hopefully) Europe
• Continuous life cycle and digital archiving
– Searching in virtual research environments
– Open issues: e.g. Authorization
50. Re: Ingest and Store Data
Component
registry
OAI-PMH ISOcat
Schema
database
service
Meertens
CMDI-dump
Indexing SOLR
CLARIN-EU
OAI-
harvester
Nederlab/CLARIN
(envisaged)
Meertens search CLARIN search
51.
52.
53.
54.
55.
56.
57.
58.
59. Open Issues
• Dispose?
• Version control?
• Quality checks?
• Formalizing and automating our approach to
digital archiving?
• …?
60. Conclusion
• Presented an overview and our vision of
Digital Archiving at the Meertens Institute
• Highlighted using and reusing data
The Meertens Institute is very active in the e-Humanities, in particular for providing the research infrastructure to do Humanities research. This is not suprrising, because almost all of us have a background in the Humanities, including the developers. The technical development department is lead by MKS, and consists of 9 people.
I should introduce myself. I am Junte. My background is information science. My skills mostly come from language technologies. I am specialized in digital libraries and information retrieval.. A year ago, I obtained my doctorate on this subject with my PhD thesis “System Evaluation of Archival Description and Access”. At the Meertens Institute, I still work on digital libraries (which includes digital archives), and primarily work on search technology.
So what will this talk be about? I will present an overview of the digital archiving of records at the Meertens Institute bylooking at interesting (in my opinion) projects here, some of them still ongoing.By looking at these projects, I try to make the vision clear of our team on digital archiving. For archivists, the question immediately is: what is the record here? This is the smallest logical piece of information. And it depends on whether it is analogue or digital. If it is in paper form, it can be a letter. If it is digital, it can be a file, and this is a very fluid definition, because a part of a file can become a file itself. In my opinion, this only opens up interesting cases for digital archiving. I will also highlight access and use and reuse in this context.
When you search for digital archiving on Wikipedia, you get re-directed to “Document or record management system” or . Here, a document or record is used in a very generic sense. A document or record can be a book, a page in a book, or a paragraph, or can be another document genre, such as a video clip or an MP3 file. The essence is that records and their contexts are preserved for long-term use. So digital archiving is about appraisal, arranging and describing digital assets. This means that you do not preserve everything.With archiving, the selection of what to archive is also essential.
So digital archiving can be described as document or record management system. Here, we should see system not as an application, but a model. The NASA in the US has developed the Open Archival Information System. (explain) This is a formal and generic model, and much of the digital archiving is based on this model.
This model has been used by digital archiving software, such as … There are also different types of archives. We have the traditional archives (explain about governments, corporations, persons), audio-visual archives , personal archives...at the Meertens we have them all, in one form or the other. And we have developed our own digital archiving approach, also based on the OAIS model, with our own tools. I should stress that at the Meertens, we only digitally archive unique records, and on demand.
To illustrate our vision on digital archiving, this model of the DCC is particularly helpful. Digital archiving is also digital curation. The lifecycle model shows that it is never really ending.
We developed an approach to digital archiving.
We can also use real volunteers. Then we speak of crowdsourcing.
Part of using and reusing data is authorization and authentication. At the Meertens Institute, we use Surfnet for our online digital archives. In order to gain access to the digital archives, different technologies are used. For the individual projects, there are different search engines. Such as the MIMORE search engine (developed by collegue Jan Pieter Kunst) that provides access to 3 different datasets on dialects. Another example is the Dutch Song Database that searches across different sources of data. The common denominator of these search engines is the technology used: The data stored in the MySQL is exported using SQL with PHP scripts. But can we combine the different data – and encourage reuse -- using a single search engine? What advanced search technologies can we use? Can we combine the different existing search engines? Some questions that I am trying to address.
We have transformed the data. We are reusing it. Before that it is possible, we again do a ingest and store archiving action. In practise, this means we harvest the federated metadata and index it.
Finally, the reuse of the data. This is done with this search engine, which we have baptized as the CMDI MI Search Engine.