2. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Problem statement
•Europeana Newspapers
• 15 libraries from several European countries
• 10 mill. of newspaper pages for refinement (OCR, OLR)
• Need to be delivered to Europeana
•Approach
• Currently no standard format available
• Unify the delivery format
• Create a METS/ALTO Profile
• Create tools in order to ease creation of ENMAP objects
2
3. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
ENMAP
•Implementation
• More than 3 mill. pages already processed
• Workflow is fully scalable, up to 100.000 pages can be processed
per day (OCR and ENMAP creation)
•Public release
• ENMAP (Europeana Newspaper Mets Alto Profile) available to the
public
• Planned for October 2013
• Accompanying information
• Examples
• Feedback is highly welcome
• Final release is planned for 2014
3
4. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Structural Metadata
•Structural elements
• Title section, headline, advertisement, illustration, caption, running
title (column title), page number, continuation note, imprint, etc.
•Text types (genres)
• breaking news, short news, book review, theatre review, obituary,
family notice, job announcement, weather forecast, novel, poem,...
4
5. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Rationale
•Why do we need these data?
• Increase granularity and information
• Improve search services (facetted search)
• Support crowd based services (apply these metadata)
• Instruct service providers
•Other standards in the field?
• TEI (Text Encoding Initiative) provides a first starting point but
objectives are different (edition vs. library use)
• Best practise models of other libraries
5
6. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
ENMAP Structural Map
•Objectives
• Contribute to some standardisation in this field
• Set up a list of these elements
• Gather feedback from libraries
• Provide definitions and examples
• Include a first version within ENMAP
6
7. Thank you for your attention!
lGünter Mühlberger
<guenter.muehlberger@uibk.ac.at>