2. TIB (Technische Informationsbibliothek)
is the German National Library of Science
and Technology
Why architectural data?
subjects: engineering, architecture, chemistry,
computer science, mathematics and physics
Competence centre for non-textual materials (KNM)
2007 – 2011 DFG funded PROBADO3D project
metadata and content based search for digital
architectural 3D models
http://www.probado.de/en_3d.html
Why digital preservation?
2009-2011: Goportis digital preservation pilot project,
together with our Goportis partners ZB MED and ZBW
Since 2012:
Goportis digital preservation system hosted by TIB
A few words about TIB
2 / 23
21 / 10 / 13
3. DURAARK (DURAble Architectural Knowledge)
FP7 – ICT – Digital Preservation (STReP)
February 2013 – January 2016
Goal
Develop methods and tools for sustainable long-term
preservation of building data (3D and BIM models,
metadata, related knowledge & Web data)
Scope
• address all layers of digital preservation (bit,
logical, semantic)
• interlinked curation and preservation workflows
• focus on two file formats: IFC and E57
• incorporate existing OAIS compliant digital
preservation system
Project overview
3 / 23
21 / 10 / 13
4. Tangible outcomes
Semantic enrichment: Vocabularies for
description of built structures and
enrichment techniques based on a unified
and sustainable naming scheme
Tailored Workflows: Thoroughly investigate
requirements of institutional stakeholders
(libraries/archives) and SMEs on long-term
archiving. Develop according workflows.
Sustainability of file formats: Face problem of
digital decay by using Industry Foundation
Classes (IFC) and E57 as open and already
well-established file formats suited for
long-term preservation. Ensure availability
of characterization tools for those formats.
Goal and Tangible Outcomes
4 / 23
21 / 10 / 13
6. UBO: Universität Bonn
- Technical Coordinator
- WP4/WP5: change management, shape
recognition
Luleå University of Technology
- WP8 leader, dissemination/exploitation
CITA, Center for Information Technology
and Architecture Copenhagen
- WP7 leader, evaluation, test
TUE, Department of the Built Environment,
Eindhoven University of Technology
- WP3 leader, semantics & metadata
Catenda, SME
- User perspective, market requirements, evaluation
Fraunhofer Austria
- WP2 leader, system specification
& integration
Consortium
6 / 23
21 / 10 / 13
Jakob Beetz (Eindhoven University of Technology)
LUH: German National Library of
Science and Technology (TIB) &
L3S Research Center Hannover
-Coordinator
- WP3 Semantic Enrichment
- WP6 leader, long-term preservation
7. 3 layers of a digital object
7 / 23
21 / 10 / 13
8. risks:
• media obsolescence
• technical failure
• human error
• DRM
http://commons.wikimedia.org/wiki/
File:Compact_Floppy.jpg
possible actions:
• media migration, refreshing, replication
• technological redundancy, ideally with geographic spread
• error detection, monitoring, recovery & disaster planning
• controlled storage with regular maintenance
• security and trust
Solved through „good IT practice“ (which, of course,
needs to be implemented …)
1. Bit(stream) [Physical] preservation layer
8 / 23
21 / 10 / 13
9. risks:
• software / file format obsolesence
• software
OS
hardware dependencies
• additionally: configuration / package dependencies
• lack of compliance to format standards („mal-formed objects“)
• DRM
possible actions:
• migration, emulation, normalization
• „hardware museum“
• data/information extraction
• extensive technical metadata capturing
• definition of significant properties (what to preserve)
Established basic processes … but they
require adaptation for new formats.
http://www.flickr.com/photos/89771128@N02/8451172304/in/pool-2121762@N23
2. Logical [object] preservation layer
9 / 23
21 / 10 / 13
10. risks:
• terminology and concepts change over time
• context and provenance may be lost
(purpose, setting, limitations, cultural context,
related objects)
possible actions:
• semantic enrichment
• tracing of metadata
• audit trail capturing
• migration at semantic level
• documentation of context
• document intended meaning / interpretation
Least developed area of digital
preservation
3. Semantic [interpretability] preservation
layer
10 / 23
21 / 10 / 13
16. Consumer Use Cases
• result of stakeholder analysis
• describe desired use, re-use, access
• will be adressed in geometric and
semantic enrichment processing layer
Knowing why something should be
preserved helps us in evaluating the
characteristics to be preserved
Use Cases (2/2)
16 / 23
21 / 10 / 13
18. Metadata: Technical
„Metadata that describes the technical state of and process used to create a file.
Often closely related either to its file format or the original software used to
create the file, e.g. scanning equipment and settings used to create or modify a
digital object.“
http://www.digitalpreservation.gov/ndsa/ndsa-glossary.html
Information needed in order to maintain access to the file
Significant properties:
criteria which an institution
considers important factors of
an object‘s quality, structure
or behaviour, which should be
preserved over time,
i.e. over the course of digital
preservation actions.
http://public.ccsds.org/publications/archive/650x0m2.pdf
Technical Metadata
18 / 23
21 / 10 / 13
19. Existing tools for various file
formats:
Jhove, Tika, fido, fits,
DROID, …
Few existing tools for IFC
and E57:
E57 validator, IFC validator
File format characterization
19 / 23
21 / 10 / 13
20. National Library of Australia: Testing Software Tools of Potential Interest for Digital Preservation
http://www.openplanetsfoundation.org/system/files/Digital%20Preservation%20Project%20Report%2
0-%20Testing%20Software%20Tools.pdf
20 / 23
21 / 10 / 13
21. IFC extraction:
geometry types
schema version
implementation level
application
version of application
measurement units
MVD
geotagged
gross area
number of stories
…
E57 extraction:
geo-referenced (yes/no)
total square metre
number of floors
resolution settings
quality settings
sensor model, sensor serial number, …
total number of scans
total number of points
intensity (yes/no)
colour (yes/no)
reasons for spatial disturbance: distribution
of detected elements
sub quality parameters (positioning) – in %
e.g., distance error matched references;
occupied quadrants
sub quality parameters (references) – in %
e.g., point drift, longitudinal mismatch
…
Potential candidates for technical metadata
21 / 23
21 / 10 / 13
22. Currently developing stakeholder questionnaire
covering the following areas:
– data holdings (formats, SW, produced internally / externally)
– data storage / management (data carriers, backup practises, archiving
practises)
– access (when, for what reason)
– experience with data loss (yes/no, reasons)
Looking for interested institutions
and multiplicators !
Want to help?
22 / 23
21 / 10 / 13