With growing storage capacities and sinking storage prices, the paradigm of keeping everything is prevailing. However, keeping information accessible, useable and useful goes far beyond purely keeping things, especially in the long run, and entails expenses much larger than just the storage costs. This issue especially applies to content in Content Management Systems where we increasingly face the situation of creating, managing and storing (preserving) multimedia content, which we might never access again due to the pure volume of content.
To overcome these issues, we envision the concept of flexible managed forgetting for information that progressively ceases in importance and finally becomes obsolete as well as for redundant information. We will extend TYPO3 with preservation and forgetting. The forgetting will also reduce the user’s cognitive burden for past activities and information in TYPO3 but still allows access if needed. The same as our brain will retrieve details of our past when remembering and getting associations, the approach will provide such means.
Within the Seventh Framework Programme for Research (FP7) of the European Union the "ForgetIT" project strives to build a solution for the mentioned problems. The project has a scope of 3 years and TYPO3 has been selected as CMS to build upon as it is Open Source Software and has an open and active community.
This talk will give an introduction into digital preservation and why companies can greatly profit from it. The current status of the research project will be demonstrated.
An overview of the project can be found on the projects website (of course made with TYPO3): http://www.forgetit-project.eu/
11. Size references
A simple text: an average Wikipedia article ≈ 3.78 kB (no markup)
Lots of text: complete Wikipedia ≈ 13.5 GB (text only, no markup)
An average image (12MP) ≈ 1.3 MB (JPG 90% quality; 24bit/pixel)
An average movie stored on Blu-ray Disc ≈ 25.48 GB
12. 1955 – The IBM 355
Capacity: 12 MB
Cost: 6,233.33 USD/MB
✘
3,250
0
✘
9
0
0.16 kB
13. 1970 – The IBM 3330
Capacity: 100 MB
Cost: 259.70 USD/MB
✘
27,089
0
✘
76
0
3.94 kB
26. ForgetIT project overview
Consortium of 11 partners
Project start was in February 2013
3 years of research & development
http://www.forgetit-project.eu
The ForgetIT project is funded by the EC within the 7th Framework
Programme under the objective "Digital Preservation"
(GA 600826).
27. Project Partners 1/2
Centre for Research and Technology Hellas
dkd Internet Service GmbH
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
Eurix Srl
Gottfried Wilhelm Leibniz Universität Hannover
28. Project Partners 2/2
IBM Israel - Science and Technology Ltd
Luleå Tekniska Universitet
The Chancellor, Masters and Scholars of the University of Oxford
The University of Edinburgh
The University of Sheffield
Turk Telekomunikasyon AS
29. Inspiring people to share!
TYPO3 is the CMS used for the organisational use cases
TYPO3 was chosen because it’s Open Source
We want to raise awareness on the matter of preservation
We will publish our modules under open source licenses
35. What is preservation?
“Preservation — The protection of cultural
property through activities that minimize
chemical and physical deterioration and
damage and that prevent loss of informational
content. The primary goal of preservation is to
prolong the existence of cultural property.”
Preservation 101
37. Problems are caused by
storage medium (disks, tapes, DVD, etc.)
format of the data
38. Problems are caused by
storage medium (disks, tapes, DVD, etc.)
format of the data
availability of the software or operating system
possible encryption
40. “The digital dark age is a possible future
situation where it will be difficult or impossible
to read historical electronic documents and
multimedia, because they have been stored in
an obsolete and obscure file format.”
Wikipedia
41. Preserving a website is not trivial
What do want you preserve?
Content only?
Content and Design?
How often? Stock prices vs. Company History page
How do you deal with browser differences?
How do you preserve functionality? E.g. insurance fee calculator
48. Organisational Use Cases
Digital Asset Management
Versioning
Archiving a complete Website
Individual genres and their specific requirements
Example: Press Release
50. Elements of a Press Release
text
image
links
documents
51. Meta information
Presseinformationen Spielwarenmesse
Global Toy Conference Now on Saturday at the
Spielwarenmesse
* Customised programme for retailers: “How to get
your customer into the shop”
* Conference will take place for the 5th time in
Nuremberg on 1 February 2014
All around the world, retailers are wondering how
they can still get their customers in their shops
in the age of the Internet – because competition
for the sale of consumer goods online is growing
dramatically. With the topic “How to Get
Customers into Your Shop – Successful Pricing,
Presentation and Selling” the Global Toy
Conference of the Spielwarenmesse demonstrates
what parameters business owners can adjust for
the future. The conference will take place for
the first time in the St Petersburg hall in the
NCC East on Saturday. The new earlier date means
that more international retailers can take
advantage of the knowledge on offer at the toy
industry's leading trade fair – from 9 a.m. to 4
p.m. on 1 February 2014.
...
54. Delete
Keep
Archive
Levels of significance
legal value
Action: keep for legal time
present value
Action: Keep for x days
archive value
Action: keep forever
trigger value
Action: Check significance
58. meta info
media
meta info
media
asset
meta info
meta info
external
Digital Asset (DAM)
etc.
editable
content
meta info
media
asset
structure
(code, users,
plugins,
extensions,
etc.)
meta info
internal
59. Info Level 4, etc.
dynamic
Info Level 3
(semi)automatic
Info Level 2
static
Info Level 1
meta info
media
meta info
meta info
Output
meta info
Archive 1
media
asset
etc.
Archive 2
editable
content
meta info
media
asset
structure
(code, users,
plugins,
extensions,
etc.
meta info
Delete
60. Info Level 4, etc.
dynamic
Info Level 3
(semi)automatic
Info Level 2
static
Info Level 1
meta info
media
meta info
meta info
Output
meta info
Archive 1
media
asset
etc.
Archive 2
editable
content
meta info
media
asset
structure
(code, users,
plugins,
extensions,
etc.
meta info
Delete
61. Info Level 4, etc.
dynamic
Info Level 3
(semi)automatic
Info Level 2
static
Info Level 1
meta info
media
meta info
meta info
Output
meta info
Archive 1
media
asset
etc.
Archive 2
editable
content
meta info
media
asset
structure
(code, users,
plugins,
extensions,
etc.
meta info
Delete
62. Info Level 4, etc.
dynamic
Info Level 3
(semi)automatic
Info Level 2
static
Info Level 1
meta info
media
meta info
meta info
Output
meta info
Archive 1
media
asset
etc.
Archive 2
editable
content
meta info
media
asset
structure
(code, users,
plugins,
extensions,
etc.
meta info
Delete
63. Info Level 4, etc.
dynamic
Info Level 3
(semi)automatic
Info Level 2
static
Info Level 1
meta info
media
meta info
meta info
Output
meta info
Archive 1
media
asset
etc.
Archive 2
editable
content
meta info
media
asset
structure
(code, users,
plugins,
extensions,
etc.
meta info
Delete
76. Do you remember the details?
Which ocean was the ForgetIT Team examining?
77. Do you remember the details?
Which ocean was the ForgetIT Team examining?
Mediterranean Sea
78. Do you remember the details?
Which ocean was the ForgetIT Team examining?
Mediterranean Sea
How many people of the ForgetIT Team were carrying a bag?
79. Do you remember the details?
Which ocean was the ForgetIT Team examining?
Mediterranean Sea
How many people of the ForgetIT Team were carrying a bag?
80. Do you remember the details?
Which ocean was the ForgetIT Team examining?
Mediterranean Sea
How many people of the ForgetIT Team were carrying a bag?
How many barcodes are on the Western Digital WD600AB?
81. Do you remember the details?
Which ocean was the ForgetIT Team examining?
Mediterranean Sea
How many people of the ForgetIT Team were carrying a bag?
How many barcodes are on the Western Digital WD600AB?
82. Do you remember the details?
Which ocean was the ForgetIT Team examining?
Mediterranean Sea
How many people of the ForgetIT Team were carrying a bag?
How many barcodes are on the Western Digital WD600AB?
How many pictures in the shoebox image are mostly blue?
83. Do you remember the details?
Which ocean was the ForgetIT Team examining?
Mediterranean Sea
How many people of the ForgetIT Team were carrying a bag?
How many barcodes are on the Western Digital WD600AB?
How many pictures in the shoebox image are mostly blue?
85. We’d love to see you participate!
Reflect your thoughts with us
Take our short survey: http://tinyurl.com/forgetit-webarchiving
Tell us your use cases
Join the development of TYPO3 features
89. References (Sources) 1/2
Size of Wikipedia (as of 2013-10-04): https://en.wikipedia.org/wiki/
Wikipedia:Size_comparisons
Average JPG size: http://web.forret.com/tools/megapixel.asp?
title=12+Megapixel+camera&width=4000&height=3000
Average movie size: http://answers.yahoo.com/question/index?
qid=20110807095141AABGQm8
Storage Prices: http://www.jcmit.com/diskprice.htm
90. References (Sources) 2/2
Forget IT Website: http://www.forgetit-project.eu
Preservation: http://unfacilitated.preservation101.org/session1/
expl_whatis-definitions.asp
Digital Dark Age: https://en.wikipedia.org/wiki/Digital_dark_age
95. References (Images) 4/8
Game pieces by Søren Schaffstein
Managed Forgetting: http://www.istockphoto.com/stockphoto-3533508-colorful-memos.php?st=0320b45
Synergetic Preservation: http://www.istockphoto.com/stockphoto-13301920-goldfish-jump.php
Contextualised Remembering: http://www.istockphoto.com/stockphoto-14370511-shoebox-of-old-photos-too.php
96. References (Images) 5/8
Cans: http://www.istockphoto.com/stock-photo-16948268-threemetallic-goods-can-with-key.php
5 1/4” Disk: https://secure.flickr.com/photos/twicepix/4330813840/
sizes/z/in/photostream/
5 1/4” Disk Drawing: https://secure.flickr.com/photos/
flattop341/2094771560/sizes/z/in/photostream/
Ami Pro: http://www.os2museum.com/wp/?attachment_id=99
Digital Dark Age by Søren Schaffstein