Presentation held by Jussi Nuorteva (Finnish National Archives) at "Freedom for Information - the Power of Open Data in the Cultural Field" on 02 May 2016 at the Upper Austrian State Archives (AT).
1. Power of Open Data
in Archives
OpenGLAM
Linz
2nd May 2016
Jussi Nuorteva
2. Outlines of the presentation
• National Archives Services of
Finland – Brief Introduction
• Building up a Digital Society
- Can analogue records be destroyed
after digitization?
• Renewal of Research Process
through Digital Research Data
• Some Development Projects
3. Archives Act
of Finland
831/1994
4 §
”The duty of the National Archives
Service is to ensure the
preservation and availability of
records belonging to the national
cultural heritage, to promote
research and to guide, develop and
study archives and records
administration.”
4. Helsinki – The National Archives
Hämeenlinnan Provincial Archive
Joensuu Provincial Archive
Jyväskylä Provincial Archive
Mikkeli Provincial Archive
Oulun Provincial Archive
Turku Provincial Archive
Vaasa Provincial Archive
Sámi Archive, Inari (National
Archives)
National Archives
Service of Finland • Inari
5.
6.
7. Military Archives & Archives of the Prime
Ministers Office to National Archives in 2008
8. Construction work has started at the site of the new Central
Archive of National Archives in Mikkeli: April 2016 (ready in 2018)
10. S A J O S
Sámi Archives
(2012)
http://www.arkisto.fi/en/the-national-archives-service/saamelaisarkisto-3
11. ”The acquisition of private documents aims at creating
archives that offer an authentic, balanced and sufficient
picture of various sectors of the Finnish society in different
eras.”
Archives – a mirror
of the society?
14. Closing party of the AvoinGLAM (OpenGLAM)
at the National Archives
Hack4FI 2015
Open
G
L
A
M
15. Holdings and services in brief
• 213 shelf kilometers documents
• 10 % of the holdings are private archives (individuals,
associations, enterprises)
• 43 million digital units (2 % of the holdings)
• 50 million annual uploadings from Digital Archive
(2012)
• 12 000 different monthly user IP’s
• 50 000 on site research visits annually
• 72 000 orders at service desk
• 21 000 public and private surveys done for customers
16. Are we really living up and using all the
opportunities of the digital world?
• More than 200 km analogue holdings still coming in!
• More than 15 million euros spent annually on rents of
archival space in the governmental administration!
• Duplicated service systems – analogue and digital!
• Digitization will take decades at todays speed!
• Risk of marginalization of the analogue materials!
• Autonomous municipalities are inefficient in e-management
• How can we better use and share public data throughout the
public administration?
17. Digitization of Public Services – Spearhead
of the present Governmental Program
• Adopted in August 2015
• Public information should be collected only once!
• Extensive digitalization to be carried out
• Avoid duplication of processes in public administration
• Open access to public information with regard of the
legislation and EU-directives protecting the data from
misuse (ownership, personal protection, security)
• Freedom of academic research is a constitutional right
18. Digitization and e-Services in the Strategy
of the National Archives of Finland 2020
• Promotes electronic archiving in public administration,
and participates actively in the development of
storage solutions
• Enhances open-minded utilisation of modern
technologies in the provision of services
• Analogue materials transferred for permanent storage
will be digitised as part of the transfer process.
• The proposed amendment of the Archives Act will
enable the destruction of analogue material without
changing the legal probative force of the documents.
19. How to proceed in practice?
• Reform of the Archives Act
– National Archives to be turned to one legal entity 1.1.2017
– Right to destroy digitized records after digitization without
them losing their legal power (ca 90 %) taking in regard their
value as national cultural heritage. International survey.
– New law on Private Archives 1.1.2017
• Decisions of the State Council (proposal)
– Provider is responsible for transfer costs throughout
administration – digitization is interpreted as transfer
– ICT-system for preseravtion of digitized and born digital
records including their operational use - Open Data principle
• Building up a process for mass-digitization
22. ”
”The National Archives
and the San Diego
Supercomputer Center
sign landmark agreement
… The partnering of NARA,
SDSC and NSF comes at a
time when the nation's
scientists and engineers are
seeking to increase U.S.
competitiveness and
leadership--and when the
massive amounts of digital
data being generated by
researchers, educators and
practitioners are demanding
new and innovative strategies
for digital preservation”
23. OECD 2004
”The rapid development in
computing technology
and the Internet have
opened up new
applications for the basic
sources of research —
the base material of
research data — which
has given a major
impetus to scientific work
in recent years.
Databases are rapidly
becoming an essential
part of the infrastructure
of the global science
system.”
24. ”Supporting research
and innovation is a key
priority of the Agenda”
[Digital Agenda for Europe]
”To collect, curate, preserve
and make available ever-
increasing amounts of
scientific data, new types of
infrastructures will be
needed”
Neelie Kroes, Vice-President of the
European Commission, responsible
for the Digital Agenda
25. AAAS 2011
Washington D.C.
•Not only for natural
sciences!
•Data-sharing and
interoperability
•Linking digital records to
scholarly publishing
•Open source and open
access (OAIS)
•Trusted digital
repositories and IPR
questions relating to data
•Licensed data and open
access principle
26.
27. • ”Supporting research and innovation is a key priority of the
Agenda” [Digital Agenda for Europe]
• ”To collect, curate, preserve and make available ever-
increasing amounts of scientific data, new types of
infrastructures will be needed”
Neelie Kroes, Vice-President of the European Commission,
responsible for the Digital Agenda
Digital Agenda for
Europe 2020
31. Privacy and Research in Personal Data
Act (523/1999) of Finland
• Personal Data Act – General prohibition to process
sensitive data (state of health, handicap, illness, race or
ethnicity, social welfare needs etc)
• BUT: Prohibition does not prevent processing of data for
purposes of historical, scientific or statistical research or a
health care unit or a health care professional from
processing data collected in the course of their operations
and relating to the state of health, illness or handicap of
the data subject or the treatment or other measures
directed at the data subject
43. Johan Göös Coat of Arms with
Ancestric Coat of Arms
Johan Göös 1697
Birckholtz Fincke Frille
Göös Wildeman Ållongren
44. Recognition and Enrichment of
Archival Documents (READ)
• Funded by European
Commission for 3.5 years
• 13 partners from across Europe
– computer scientists,
archivists and scholars
• Handwritten text recognition
(HTR)
• Allows users to search and
automatically transcribe
digitised historical material
45. Transkribus
• A downloadable platform for the automated
recognition, transcription and searching of
historical documents.
46. In Transkribus users can…
• Upload their own documents - keep them
private or share them with other users
• Transcribe text for the training of HTR or in
order to make digital editions
• Transcriptions can be enhanced with tags
• Documents can be exported in several
formats – PAGE XML, TEI, PDF
49. READ will revolutionize access to
archival collections…
• By offering the possibility of full-text search for
handwritten documents (due to HTR)
• By providing tools that will make easier to index and
analyze digitized collections (even automatically)
• By giving a platform where research groups across the
world can work together with common data
• By enabling new kind of research use of extensive
handwritten materials
50. Diplomatarium Fennicum
• Research database of medieval documents
concerning Finland
• Consists of 6,700 documents from 11th to
16th century
• Based on Finlands Medeltids Urkunder (1910-
1935) by Reinhold Hausen
• Beta release in October 2016
51. DF Database contains
• All existing editions
of the documents
• Enriched metadata
• Images of original
charters
• Bibliography
• Connection to other
medieval databases
52. • Advanced search mode; all the different
editions will be searchable
• Related documents are linked together
• New critical editions that meet demands of
linguistics (gradually)
• Possibility for the use of crowdsourcing
53.
54. Administrative registers as a
source for scientific research
• Official data are often in the format of electronic
records in Finland
• Registers contain different kinds of individual level
data which are needed for administrative purposes
(e.g. health services, social services, education,
taxation)
• All registers use the same personal identity code
(PID) for each Finnish citizen
• Legislation allows data from different registers to be
linked by PID for research purposes which makes
register data a rich source for scientific research
• The confidentiality of the personal data is a special
challenge for access services
55. Aims of the FMAS
The Finnish Microdata Research Services (FMAS) are developed in
order to
1) inform and guide researchers about the possibilities of register
research
2) help the researchers find available data
3) provide an electronic permit system for researchers to apply for
permits with one application per one study and
4) make available a remote desktop service, where researchers can
securely obtain and analyze unit level data
As a whole, the new services will help the whole research process of
register-based research.
56. Construction project
• In 2013, FMAS was accepted into Finland’s roadmap
for research infrastructure.
• The construction project is hosted by the National
Archives and Statistics Finland
• Funded so far by the Academy of Finland (2014),
Social insurance institute (2015-16), and the host
organizations
• More funding has been and will be applied
• The services will be available at the earliest 2017
56
57. Joint permit service
• The National Archives is constructing the FMAS joint
permit service
• The service will be a centralized digital service through
which researchers can apply for permits to use register
data
• Only one application per study will be needed; the
researcher can ask for permit to use data from several
different organizations simultaneously by the same
interactive digital application form
• For the authorities the service would offer the platform
to receive and handle the applications and to store the
decisions which are official administrative documents
• The access to the system and documents will be
controlled by strong authentication methods