The document discusses digital preservation and the ForgetIT project. It introduces the problems of preserving large amounts of digital content and losing access to it over time. The ForgetIT project aims to apply the human concepts of preservation and forgetting to computer systems, preserving only the most valuable content while allowing other content to be forgotten. It does this by transforming content into semantic linked data and measuring key values to determine what should be preserved or forgotten.
2. Let me start with a quote
“Nowadays people know
the price of everything
TYPO3 Congres Amsterdam 2014
and
the value of nothing.”
!
― Oscar Wilde, The Picture of Dorian Gray
3. The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
TYPO3 Congres Amsterdam 2014
Contact 7
4. The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
TYPO3 Congres Amsterdam 2014
Contact 7
6. Welcome to the digital, information age...
a never ending flood of content!
Technology enables us to produce nearly unlimited content
We are still „hunters and gatherers“
Storage space feels to be „infinite“, but resources are limited
Evolution of technology = new standards and formats
TYPO3 Congres Amsterdam 2014
7. What happens if we loose the
ability to view/retrieve all this
content?
TYPO3 Congres Amsterdam 2014
8. So how do we handle this ?
TYPO3 Congres Amsterdam 2014
9. We preserve !
“Preservation — The protection of cultural property
through activities that minimize chemical and physical
deterioration and damage and that prevent loss of
informational content. The primary goal of preservation
is to prolong the existence of cultural property.”
dkd staff Meeting, 13.08.2014, Frankfurt
Preservation 101
10. Preserving a website is not trivial
What do you want to preserve?
Content only?
Content and Design?
How often? Stock prices vs. Company History page
How do you deal with browser differences?
How do you preserve functionality? E.g. insurance fee calculator
dkd staff Meeting, 13.08.2014, Frankfurt
11. What do you preserve?
TYPO3 Congres Amsterdam 2014
12. How do you preserve?
TYPO3 Congres Amsterdam 2014
13. The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
TYPO3 Congres Amsterdam 2014
Contact 7
14. The solution
A project funded by the EU called ForgetIT
TYPO3 Congres Amsterdam 2014
15. Concise Preservation by combining Managed
Forgetting and Contextualized Remembering
TYPO3 Congres Amsterdam 2014
16. The Project - Facts
TYPO3 Congres Amsterdam 2014
EU research project
Part of the Seventh framework
programme
Countries involved : Germany,
Sweden, Israel, Turkey,
Greece, United Kingdom, Italy
Project duration: 2013/2016
18. The Project - Goals
TYPO3 Congres Amsterdam 2014
ForgetIT aims to transfer the
human Preservation and
Forgetting concepts to computer
systems
Meaningful Preservation and
Forgetting
Consider differences between
individuals and organizations and
demonstrate via use cases
19. The Project - our approach to the organizational part
TYPO3 Congres Amsterdam 2014
Transform the basis of content
management to become semantic,
using a linked data approach
Subsequently measure key value
indicators to determine the value of
content, it’s relevance and benefits
Enable humans and systems to
actively preserve or forget
20. How does human memory work?
TYPO3 Congres Amsterdam 2014
22. What have you stored in your mind?
What is the dkd color code?
Which color had the word „Holz“ written on the post it! ?
How many wooden stickt sat in the vase?
Were there 2,3 or 4 green Copic-Markers?
Was the lap top on the lounge table open or closed?
TYPO3 Congres Amsterdam 2014
23. What have you stored in your mind?
TYPO3 Congres Amsterdam 2014
25. Role in the Preserve-or-Forget
Active system Preserve-or-Forget Middleware Archival Information system
TYPO3 Congres Amsterdam 2014
26. The link to human memory
TYPO3 Congres Amsterdam 2014
27. The link to human memory
Digital preservation
TYPO3 Congres Amsterdam 2014
28. The link to human memory
Digital preservation
Forgetting without context
TYPO3 Congres Amsterdam 2014
29. The link to human memory
Digital preservation
Forgetting without context
TYPO3 Congres Amsterdam 2014
Preservation with learning
30. The link to human memory
Digital preservation
Forgetting without context Preservation with context
TYPO3 Congres Amsterdam 2014
Preservation with learning
31. The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
TYPO3 Congres Amsterdam 2014
Contact 7
33. „The semantic Web is an extension of the
current web in which information is given
well-defined meaning, better enabling
computers and people to work in
cooperation“
Tim Berners-Lee
dkd staff Meeting, 13.08.2014, Frankfurt
36. What people see in a website
dkd staff Meeting, 13.08.2014, Frankfurt
37. What do machines see in a website
dkd staff Meeting, 13.08.2014, Frankfurt
38. What machines could see in a website
Brandenburg
gate
Berlin
Destination
Social Location
Media
dkd staff Meeting, 18.09.2014, Frankfurt
Subscription
39. Let‘s start simple
Brandenburg gate
Type
Location
Built
: Landmark
: Berlin
: 1788-1791
dkd staff Meeting, 18.09.2014, Frankfurt
40. Let‘s start simple
Brandenburg gate
Type
Location
Built
: Landmark
: Berlin
: 1788-1791
dkd staff Meeting, 18.09.2014, Frankfurt
Type
Location
Area
: Capital
: Germany
: 891.85 km2
Berlin
41. Let‘s start simple
Brandenburg gate
Type
Location
Built
: Landmark
: Berlin
: 1788-1791
dkd staff Meeting, 18.09.2014, Frankfurt
Type
Location
Area
: Capital
: Germany
: 891.85 km2
Berlin
42. Triples
dkd staff Meeting, 18.09.2014, Frankfurt
Berlin is a capital
44. How do we implement triples?
URI URI URI
dkd staff Meeting, 18.09.2014, Frankfurt
45. Ontologies
dkd staff Meeting, 13.08.2014, Frankfurt
define hierarchies
help us describe relations
provide the general structure
ensure interdisciplinary understanding
46. Hierarchy in an ontology
dkd staff Meeting, 18.09.2014, Frankfurt
Europe
Germany
Berlin
Bradenburg gate
48. Chronic-ling
America
World
Fact-book
YAGO
Geo
Names
WordNet
(VUA)
URI
Burner
TYPO3 Congres Amsterdam 2014
UniProt
UMBEL
Uberblic
totl.net
Tele-graphis
TCM
Gene
DIT
Sudoc
semantic
web.org
SW
Dog
Food
BibBase
UN/
LOCODE
OS
ERA
riese
Revyu
RDF
Book
Mashup
Open
Cyc
Open
Calais
New
York
Times
Linked
MDB
Greek
DBpedia
LinkedCT
lingvoj
Lexvo
DBLP
(L3S)
Family
iServe
IdRef
Sudoc
Geo
Species
SIDER
Project
Guten-berg
Drug
Bank
Disea-some
DBLP
(FU
Berlin)
Daily
Med
Freebase
wrappr
Fishes
of Texas
Event
Media
Enipedia
DDC data
dcs
(DB
Tune)
Portu-guese
DBpedia
dbpedia
lite
DBpedia
data-open-ac-
uk
Pokedex
data
bnf.fr
Cornetto
BNB
Taxono
my
UniProt
(Bio2RDF)
PRO-SITE
Pfam
PDB
HGNC
BBC
Wildlife
Finder
BBC
Music
49. Scotland
Pupils &
Exams
Ocean
Drilling
Codices
GovTrack
TYPO3 Congres Amsterdam 2014
RESEX
IBM
User-generated content
As of September 2011
Audio
Scrobbler
(DBTune)
Music
Brainz
(zitgist)
P20
Turismo
de
Zaragoza
yovisto
Yahoo!
Geo
Planet
World
Fact-book
Moseley
Folk
YAGO
El
Viajero
Tourism
BBC
Program
mes BBC
Geo
Names
WordNet
(VUA)
WordNet
(W3C)
VIVO UF
Calames
VIVO
Indiana
VIVO
Cornell
VIAF
URI
Burner
Sussex
Reading
Lists
Plymouth
Reading
Lists
Source Code
Ecosystem
Linked Data
UniProt
PubMed
UniRef
UMBEL
UK Post-codes
legislation
data.gov.uk
Uberblic
UB
Mann-heim
TWC LOGD
Twarql
transport
data.gov.
uk
Traffic
Scotland
theses.
fr
Thesau-rus
W
totl.net
Tele-graphis
Semantic
Tweet
TCM
Gene
DIT
Taxon
Concept
Open
Library
(Talis)
tags2con
delicious
t4gm
info
Swedish
Open
Cultural
Heritage
Surge
Radio
Sudoc
STW
RAMEAU
SH
statistics
data.gov.
uk
St.
Andrews
Resource
Lists
ECS
South-ampton
EPrints
SSW
Thesaur
us
Linked
User
Feedback
gnoss
Greek
DBpedia
Smart
Link
Slideshare
2RDF
semantic
web.org
Semantic
XBRL
SW
Dog
Food
US SEC
(rdfabout)
Sears
Scotland
Geo-graphy
Scholaro-meter
WordNet
(RKB
Explorer)
Wiki
UN/
LOCODE
Ulm
ECS
(RKB
Explorer)
Roma
RISKS
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAAS
KISTI
JISC
IRIT
IEEE
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP
(RKB
Explorer)
Crime
Reports
UK
Course-ware
CORDIS
(RKB
Explorer)
CiteSeer
Budapest
ACM
riese
Revyu
research
data.gov.
Ren. uk
Energy
Genera-tors
reference
data.gov.
uk
Recht-spraak.
nl
RDF
ohloh
Last.FM
(rdfize)
RDF
Book
Mashup
Rådata
nå!
PSH
Product
Types
Ontology
Product
DB
PBAC
Poké-pédia
patents
data.go
v.uk
Ox
Points
Ord-nance
Survey
Openly
Local
Open
Library
Open
Cyc
Open
Corpo-rates
Open
Calais
OpenEI
Open
Election
Data
Project
Open
Data
Thesau-rus
Ontos
News
Portal
OGOLOD
Janus
AMP
New
York
Times
NVD
ntnusc
NTU
Resource
Lists
Norwe-gian
MeSH
NDL
subjects
ndlna
my
Experi-ment
Italian
Museums
medu-cator
MARC
Codes
List
Man-chester
Reading
Lists
Lotico
Weather
Stations
London
Gazette
LOIUS
Linked
Open
Colors
lobid
Resources
lobid
Organi-sations
LEM
Linked
MDB
LinkedL
CCN
Linked
GeoData
LinkedCT
LOV
Linked
Open
Numbers
LODE
Eurostat
(Ontology
Central)
Linked
EDGAR
(Ontology
Central)
Linked
Crunch-base
lingvoj
Lichfield
Spen-ding
LIBRIS
Lexvo
LCSH
DBLP
(L3S)
Linked
Sensor Data
(Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National
Radio-activity
JP
Jamendo
(DBtune)
Italian
public
schools
ISTAT
Immi-gration
iServe
IdRef
Sudoc
NSZL
Catalog
Hellenic
PD
Hellenic
FBD
Piedmont
Accomo-dations
GovWILD
Google
Art
wrapper
GESIS
GeoWord
Net
Geo
Species
Geo
Linked
Data
GEMET
GTAA
STITCH
SIDER
Project
Guten-berg
Medi
Care
Euro-stat
(FUB)
EURES
Drug
Bank
Disea-some
DBLP
(FU
Berlin)
Daily
Med
CORDIS
(FUB)
Freebase
flickr
wrappr
Fishes
of Texas
Finnish
Munici-palities
ChEMBL
FanHubz
Event
Media
EUTC
Produc-tions
Eurostat
Europeana
EUNIS
EU
Insti-tutions
ESD
stan-dards
EARTh
Enipedia
Popula-tion
(En-
AKTing)
NHS
(En-
AKTing) Mortality
(En-
AKTing)
Energy
(En-
AKTing)
Crime
(En-
AKTing)
CO2
Emission
(En-
AKTing)
EEA
SISVU
educatio
n.data.g
ov.uk
ECS
South-ampton
ECCO-TCP
GND
Didactal
ia
DDC Deutsche
Bio-graphie
data
dcs
Music
Brainz
(DBTune)
Magna-tune
John
Peel
(DBTune)
Classical
(DB
Tune)
Last.FM
artists
(DBTune)
DB
Tropes
Portu-guese
DBpedia
dbpedia
lite
DBpedia
data-open-ac-
uk
SMC
Journals
Pokedex
Airports
NASA
(Data
Incu-bator)
Music
Brainz
(Data
Incubator)
Metoffice
Weather
Forecasts
Discogs
(Data
Incubator)
Climbing
data.gov.uk
intervals
Data
Gov.ie
data
bnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2
Bio2RDF
business
data.gov.
uk
Bricklink
Brazilian
Poli-ticians
BNB
UniSTS
UniPath
way
UniParc
Taxono
my
UniProt
(Bio2RDF)
SGD
Reactome
Pub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIM
MGI
KEGG
Reaction
KEGG
Pathway
KEGG
Glycan
KEGG
Enzyme
KEGG
Drug
KEGG
Com-pound
InterPro
Homolo
Gene
HGNC
Gene
Ontology
GeneID
Affy-metrix
bible
ontology
BibBase
FTS
BBC
Wildlife
Finder
Music
Alpine
Ski
Austria
LOCAH
Amster-dam
Museum
AGROV
OC
AEMET
US Census
(rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
Author: Anja Jentzsch, source: http://en.wikipedia.org/wiki/File:LOD_Cloud_Diagram_as_of_September_2011.png
50. Custom ontologies
Industry specific
dkd staff Meeting, 13.08.2014, Frankfurt
ontology
Geographic locations
ontology
Your very own
company ontology
51. Semantic search
TYPO3 Congres Amsterdam 2014
create specific and precise queries
have the meaning of the intended
information
receive cumulative results from
different sources
image search through concept
detection
52. Concept detection
TYPO3 Congres Amsterdam 2014
If we can enable computers to see the
content of an image, they would be
able to detect concepts and give us
accurate image results
Derive context and meaning
53. Content extraction
TYPO3 Congres Amsterdam 2014
Once a computer is able to
understand content of text, it can
reduce redundant text (unnecessary
words, reiterations)
Integration and reuse of information
54. Content value
TYPO3 Congres Amsterdam 2014
Custom ontologies allow a company to
attach asses a semantic object’s value
Redefines itself over time
Think of a 2d map to locate important
terms/products/events from a
company’s perspective
55. The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
TYPO3 Congres Amsterdam 2014
Contact 7
63. Further assumptions on Content Value
TYPO3 Congres Amsterdam 2014
Relevance does influence the value in
the ontology over time
Changes in a companies strategy or
portfolio will force a reassessment of
Content and Content Value
Who created the content could be
important to calculate the Content
Value
64. The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
TYPO3 Congres Amsterdam 2014
Roadmap 7
65. Strategy
Four strategies that can be combined
to gain access to content value
TYPO3 Congres Amsterdam 2014
66. How to define your own content value (1)
Start with some inventory on the content
What kind of content do you create? News? Pages? PDF?
How long does it stay in its actual state? (Content Lifecycle)
When does it expire? What happens with archived content?
How much content do you create?
…
TYPO3 Congres Amsterdam 2014
67. How to define your own content value (2)
Look at the people creating the content
Create a social graph of the people and the content they create
Identify the main nodes in the graph
Calculate the network size of those nodes
Identify those that have the most impact based on borders crossed
…
TYPO3 Congres Amsterdam 2014
68. How to define your own content value (3)
Look at the analytics of the content
Where lies your hot / cold content based of external usage
Which content gives you the most reactions such as shares,
mentions, comments?
Bear in mind that popularity might be an indicator, but sometimes
might mislead you.
…
TYPO3 Congres Amsterdam 2014
69. How to define your own content value (4)
Look inside the content you create
What are the top 100 words in your content (w/o stop words)
Which entities belong to you? Which to you industry?
How is the density of you entities built up?
Cluster documents based of the taxonomies they use?
Identify orphaned content
…
TYPO3 Congres Amsterdam 2014
70. The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
TYPO3 Congres Amsterdam 2014
Contact 7
71. The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
TYPO3 Congres Amsterdam 2014
Contact 7
72. Contact
Olivier Dobberkau <olivier.dobberkau@dkd.de>
ForgetIT Project Website: www.forgetit-project.eu
Twitter: @ForgetITProject
Code will be published on Github in 2015
TYPO3 Congres Amsterdam 2014
73. Thank you for your attention!
TYPO3 Congres Amsterdam 2014