SlideShare ist ein Scribd-Unternehmen logo
1 von 28
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3
Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics
[Digital Preservation]
“This project has received funding from the European Union’s Seventh
Framework Programme for research, technological development and
demonstration under grant agreement no601138”.
Semi-automated metadata extraction
in the long term
Emma Tonkin, King’s College London
DPC Workshop, Belfast, Dec 2015
Structure of presentation
2
 Introduction to Pericles
 Layers of Metadata
 Sources of Metadata
 Time, space and data
 Semi-automated metadata as mitigating factor
Introduction to Pericles
Introduction to PERICLES
4
 Four-year Integrated Project (2013-2017) funded by the European Union
under its Seventh Framework Programme
 Promoting and Enhancing Reuse of Information throughout the Content
Lifecycle taking account of Evolving Semantics
 Two (or three) domains:
− Digital artworks, such as interactive software-based installations, and
other digital media from Tate's collections
− Material from Tate's archives
− Experimental scientific data originating from the European Space
Agency and International Space Station.
Model-driven approach
5
 Essentially all archives are based around some conceptual model of the
material held
 PERICLES applies formal models to describe
− Objects
− Entities associated with objects
− Broader community
 These models support processes such as appraisal and QA, and
consequentially functionality such as maintenance and actions taken for
sustainability
 A broad variety of models are under consideration: semantic (ontological)
models to formally describe objects; social network graphs to describe
community; statistical models to describe technology obsolescence...
Layers of Metadata
Open Archival Information System
7
 OAIS reference model
− “conceptual framework for an archival system dedicated to preserving
and maintaining access to digital information over the long term“
-Lavoie, B. (2000). Meeting the challenges of digital preservation: The OAIS reference model
 OAIS-compliance
− adherence to ISO 14721:2003 or (now) ISO 14721:2012
− Specifies conceptual framework, functional model, information model
OAIS information model
8
Descriptive metadata
9
 Supporting humans and machines
 Goal: interpreting data object
 Not always possible to automatically interpret data objects on any level
(some are fully opaque)
 Consider:
− 'Unstructured' natural-language texts, such as letters, books, articles
− Images of artworks
− Images of letters
− Recordings of audiovisual presentations
− Complex data files
Sources of descriptive metadata
Automated metadata extraction
11
 Popular view on indexing metadata:
− “the more, the merrier”
 Risks of low-quality metadata:
− Low accuracy on search and browse tasks; occasionally embarrassing
misinterpretations
 Benefits:
− Additional metadata can improve search indexing
How good is automated metadata
extraction?
12
 Varies significantly depending on the precise task and source material
 Automated metadata extraction tends to apply probabilistic (machine
learning) or heuristic approaches
 Machine-eye view:
− describe what is present
− Infer what is not based on:
 Knowledge base
 Comparison with other items
 Learning from training examples ('supervised learning')
Crowdsourcing metadata
13
 The 'phone a friend' approach to metadata generation
− Make material available to public
− Encourage them to annotate (example: social tagging)
− Examine the result
 The likely result:
− Some material extensively annotated; some descriptive annotations;
some formally structured; some personal ('cryptic')
− Some/most material receives no notice and is not annotated at all
 Mitigation: engineer more consistent coverage through, for example,
gamification (see Galaxy Zoo)
 Identify incentives that encourage public to contribute
Capturing 'live' metadata
14
 If the environment is accessible at the time of creation:
− Technical 'live' metadata may be captured
− Within Pericles, this is referred to as 'significant environment
information'
− Example: steps in creation, time of creation, contextual relevance of
other files…
 Another sort of 'live' metadata emerges from observation of behaviour of
those engaging with the data
− Interaction with search/browse interfaces (cf. information scent)
− Satisfaction with results
− Patterns of sharing and reuse (information diffusion on social networks,
for example)
Time, space and data
Theoretical reach of information
16
Theoretical reach of information
17
Image source: S Korotkiy
 Receiving the signal is only the start
 Can we decode the signal?
− Technical decoding
− Practical comprehension
 Confounding factors in decoding metadata:
− Language
− Dialect
− Prerequisite knowledge
Practical reach of information
18
Language: space travel
19
Language change: Time travel
20
 Language may be viewed as a complex adaptive system (Beckner et al,
2007)
− Made up of many tiny parts - people talking, writing, gesturing
− Adaptive, because we change our behaviour based on past
interactions
− Many factors influence its development: biology of perception; social
structure; experience
 Probabilistic processes underlie language change: collective experience
and eventual consensus
Example: Photogram (Getty Art &
Architecture Thesaurus)
21
The challenges of decreasing
accessibility
 Unfamiliar data
− Technical encoding – well-understood problems
− Challenges of internationalisation
 Unfamiliar texts
− Conventions and best practices change over time
− Coherence degrades long before it fails entirely (slower to read: takes
more effort: machines trained on modern texts are likely to encounter
issues with texts outside that timeframe)
 Challenges of unfamiliar artefacts
− There are many more questions that may be asked about an object: for
example, in the case of artworks, “artist's intent” may be significant
− Once lost, these are very difficult to infer
Understanding unfamiliar material
23
 Understanding unfamiliar material, though hard, is easier than finding it
 Separate processes:
− Recognising a term
− Identifying (generating) a term
 Recognition is faster and more reliable
 Why:
− Recognising a term: connecting term to concept
− Generating terms: search around a concept looking through large pool
of candidate terms for the one that might work best here
− Think yourself into the curator's shoes: what terms might they have
used for the concept that interests you, and why?
Term recognition vs generation
24
Semi-automated metadata as
mitigating factor
 Peirce: semiotic triad, relating symbol, object and interpreter
− Software agents: machine-level features (machine perception) –
words found in documents, colours, shapes or patterns found in
images…
− Human agents: perception; comprehension; application of relevant
knowledge; interpretation into a set of concepts; encoding
observations into terms
 Observing the behaviour of human agents throughout the lifecycle of the
digital object allows us to study change in manual interpretation and
encoding
 This permits us to characterise these patterns of change
 It also permits software agents to be brought into line with changing norms
Relating concept, feature, agent and
term
26
Conclusion
 PERICLES combines
− model-led approaches to data management
− data-led approaches to modelling and characterising the changing
environment and context(s) of reuse
 Approach acknowledges dynamical nature of system in which reuse occurs
 Downside: such an approach requires ongoing availability of material
(ethically) gleaned from observational data
− Consequentially, a closed archive or an archive that excites little
interest remains difficult to sustain, unless data is sourced elsewhere
 In conclusion, therefore, data-led approaches gain from joint infrastructure
and open data
Conclusion

Weitere ähnliche Inhalte

Was ist angesagt?

Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!Jochen Hummel
 
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked KnowledgeSands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledgesandsfish
 
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...JISC KeepIt project
 
Knowledge Representation essay outline
Knowledge Representation essay outlineKnowledge Representation essay outline
Knowledge Representation essay outlinemondesser
 
Policy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protectionPolicy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protectionSven Lieber
 
Tomas Singliar
Tomas SingliarTomas Singliar
Tomas Singliarbutest
 
Hypertext System
Hypertext SystemHypertext System
Hypertext Systemayina_11
 
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...Dejan Kovachev
 
Internet Security for Beginners
Internet Security for Beginners Internet Security for Beginners
Internet Security for Beginners chee wai wong
 

Was ist angesagt? (11)

Getaneh Alemu
Getaneh AlemuGetaneh Alemu
Getaneh Alemu
 
Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!
 
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked KnowledgeSands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
 
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
 
Knowledge Representation essay outline
Knowledge Representation essay outlineKnowledge Representation essay outline
Knowledge Representation essay outline
 
Policy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protectionPolicy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protection
 
Tomas Singliar
Tomas SingliarTomas Singliar
Tomas Singliar
 
Hypertext System
Hypertext SystemHypertext System
Hypertext System
 
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...
 
Hypertext system
Hypertext systemHypertext system
Hypertext system
 
Internet Security for Beginners
Internet Security for Beginners Internet Security for Beginners
Internet Security for Beginners
 

Andere mochten auch

Preservation Metadata Initiatives and Standards
Preservation Metadata Initiatives and StandardsPreservation Metadata Initiatives and Standards
Preservation Metadata Initiatives and StandardsKEEP_project
 
The Reference Model for an Open Archival Information System (OAIS)
The Reference Model for an Open Archival Information System (OAIS)The Reference Model for an Open Archival Information System (OAIS)
The Reference Model for an Open Archival Information System (OAIS)Michael Day
 
20 Years, 20 Recipes - Happy Thanksgiving from Aristotle
20 Years, 20 Recipes -  Happy Thanksgiving from Aristotle20 Years, 20 Recipes -  Happy Thanksgiving from Aristotle
20 Years, 20 Recipes - Happy Thanksgiving from AristotleAristotle, Inc.
 
Aprovação do governo no município de São Paulo - Maio 2016
Aprovação do governo no município de São Paulo - Maio 2016Aprovação do governo no município de São Paulo - Maio 2016
Aprovação do governo no município de São Paulo - Maio 2016Miguel Rosario
 
Introduction to the Reference Model for an Open Archival Information System (...
Introduction to the Reference Model for an Open Archival Information System (...Introduction to the Reference Model for an Open Archival Information System (...
Introduction to the Reference Model for an Open Archival Information System (...Michael Day
 

Andere mochten auch (6)

Oais
OaisOais
Oais
 
Preservation Metadata Initiatives and Standards
Preservation Metadata Initiatives and StandardsPreservation Metadata Initiatives and Standards
Preservation Metadata Initiatives and Standards
 
The Reference Model for an Open Archival Information System (OAIS)
The Reference Model for an Open Archival Information System (OAIS)The Reference Model for an Open Archival Information System (OAIS)
The Reference Model for an Open Archival Information System (OAIS)
 
20 Years, 20 Recipes - Happy Thanksgiving from Aristotle
20 Years, 20 Recipes -  Happy Thanksgiving from Aristotle20 Years, 20 Recipes -  Happy Thanksgiving from Aristotle
20 Years, 20 Recipes - Happy Thanksgiving from Aristotle
 
Aprovação do governo no município de São Paulo - Maio 2016
Aprovação do governo no município de São Paulo - Maio 2016Aprovação do governo no município de São Paulo - Maio 2016
Aprovação do governo no município de São Paulo - Maio 2016
 
Introduction to the Reference Model for an Open Archival Information System (...
Introduction to the Reference Model for an Open Archival Information System (...Introduction to the Reference Model for an Open Archival Information System (...
Introduction to the Reference Model for an Open Archival Information System (...
 

Ähnlich wie Semi-automated metadata extraction in the long-term

Hans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital PreservationHans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital PreservationNational Digital Forum
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...tobold
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...Sebastian Dennerlein
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPCGenoveva Vargas-Solar
 
Context culture metadata_openscout20120301
Context culture metadata_openscout20120301Context culture metadata_openscout20120301
Context culture metadata_openscout20120301Jan Pawlowski
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring reportpathsproject
 
Digital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigitalPreservationEurope
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Sandro D'Elia
 
Capitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger DataCapitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger DataShalin Hai-Jew
 
Web analytics webinar
Web analytics webinarWeb analytics webinar
Web analytics webinarJim Jansen
 
Web analytics presentation
Web analytics presentationWeb analytics presentation
Web analytics presentationJim Jansen
 

Ähnlich wie Semi-automated metadata extraction in the long-term (20)

Hans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital PreservationHans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital Preservation
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
Trm Introduction
Trm IntroductionTrm Introduction
Trm Introduction
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
 
20130222 kaptur training_goldsmiths
20130222 kaptur training_goldsmiths20130222 kaptur training_goldsmiths
20130222 kaptur training_goldsmiths
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPC
 
Trm Training Overview Planets
Trm Training Overview PlanetsTrm Training Overview Planets
Trm Training Overview Planets
 
Context culture metadata_openscout20120301
Context culture metadata_openscout20120301Context culture metadata_openscout20120301
Context culture metadata_openscout20120301
 
Research Data MANTRA
Research Data MANTRAResearch Data MANTRA
Research Data MANTRA
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring report
 
Digital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and Requirements
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708
 
Capitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger DataCapitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger Data
 
Web analytics webinar
Web analytics webinarWeb analytics webinar
Web analytics webinar
 
Research Data Mantra - March 2011
Research Data Mantra - March 2011Research Data Mantra - March 2011
Research Data Mantra - March 2011
 
Web analytics presentation
Web analytics presentationWeb analytics presentation
Web analytics presentation
 

Mehr von PERICLES_FP7

Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17PERICLES_FP7
 
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...PERICLES_FP7
 
Technical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopTechnical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopPERICLES_FP7
 
ForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelPERICLES_FP7
 
Data quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectiveData quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectivePERICLES_FP7
 
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...PERICLES_FP7
 
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...PERICLES_FP7
 
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016PERICLES_FP7
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangePERICLES_FP7
 
Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...PERICLES_FP7
 
Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016PERICLES_FP7
 
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES_FP7
 
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...PERICLES_FP7
 
Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016PERICLES_FP7
 
Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016PERICLES_FP7
 
Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...PERICLES_FP7
 
PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES_FP7
 
PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES_FP7
 
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES_FP7
 
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES_FP7
 

Mehr von PERICLES_FP7 (20)

Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17
 
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
 
Technical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopTechnical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshop
 
ForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information Model
 
Data quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectiveData quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspective
 
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
 
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
 
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on Change
 
Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...
 
Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016
 
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
 
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
 
Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016
 
Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016
 
Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...
 
PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...
 
PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016
 
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
 
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
 

Kürzlich hochgeladen

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Semi-automated metadata extraction in the long-term

  • 1. GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] “This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”. Semi-automated metadata extraction in the long term Emma Tonkin, King’s College London DPC Workshop, Belfast, Dec 2015
  • 2. Structure of presentation 2  Introduction to Pericles  Layers of Metadata  Sources of Metadata  Time, space and data  Semi-automated metadata as mitigating factor
  • 4. Introduction to PERICLES 4  Four-year Integrated Project (2013-2017) funded by the European Union under its Seventh Framework Programme  Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics  Two (or three) domains: − Digital artworks, such as interactive software-based installations, and other digital media from Tate's collections − Material from Tate's archives − Experimental scientific data originating from the European Space Agency and International Space Station.
  • 5. Model-driven approach 5  Essentially all archives are based around some conceptual model of the material held  PERICLES applies formal models to describe − Objects − Entities associated with objects − Broader community  These models support processes such as appraisal and QA, and consequentially functionality such as maintenance and actions taken for sustainability  A broad variety of models are under consideration: semantic (ontological) models to formally describe objects; social network graphs to describe community; statistical models to describe technology obsolescence...
  • 7. Open Archival Information System 7  OAIS reference model − “conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over the long term“ -Lavoie, B. (2000). Meeting the challenges of digital preservation: The OAIS reference model  OAIS-compliance − adherence to ISO 14721:2003 or (now) ISO 14721:2012 − Specifies conceptual framework, functional model, information model
  • 9. Descriptive metadata 9  Supporting humans and machines  Goal: interpreting data object  Not always possible to automatically interpret data objects on any level (some are fully opaque)  Consider: − 'Unstructured' natural-language texts, such as letters, books, articles − Images of artworks − Images of letters − Recordings of audiovisual presentations − Complex data files
  • 11. Automated metadata extraction 11  Popular view on indexing metadata: − “the more, the merrier”  Risks of low-quality metadata: − Low accuracy on search and browse tasks; occasionally embarrassing misinterpretations  Benefits: − Additional metadata can improve search indexing
  • 12. How good is automated metadata extraction? 12  Varies significantly depending on the precise task and source material  Automated metadata extraction tends to apply probabilistic (machine learning) or heuristic approaches  Machine-eye view: − describe what is present − Infer what is not based on:  Knowledge base  Comparison with other items  Learning from training examples ('supervised learning')
  • 13. Crowdsourcing metadata 13  The 'phone a friend' approach to metadata generation − Make material available to public − Encourage them to annotate (example: social tagging) − Examine the result  The likely result: − Some material extensively annotated; some descriptive annotations; some formally structured; some personal ('cryptic') − Some/most material receives no notice and is not annotated at all  Mitigation: engineer more consistent coverage through, for example, gamification (see Galaxy Zoo)  Identify incentives that encourage public to contribute
  • 14. Capturing 'live' metadata 14  If the environment is accessible at the time of creation: − Technical 'live' metadata may be captured − Within Pericles, this is referred to as 'significant environment information' − Example: steps in creation, time of creation, contextual relevance of other files…  Another sort of 'live' metadata emerges from observation of behaviour of those engaging with the data − Interaction with search/browse interfaces (cf. information scent) − Satisfaction with results − Patterns of sharing and reuse (information diffusion on social networks, for example)
  • 16. Theoretical reach of information 16
  • 17. Theoretical reach of information 17 Image source: S Korotkiy
  • 18.  Receiving the signal is only the start  Can we decode the signal? − Technical decoding − Practical comprehension  Confounding factors in decoding metadata: − Language − Dialect − Prerequisite knowledge Practical reach of information 18
  • 20. Language change: Time travel 20  Language may be viewed as a complex adaptive system (Beckner et al, 2007) − Made up of many tiny parts - people talking, writing, gesturing − Adaptive, because we change our behaviour based on past interactions − Many factors influence its development: biology of perception; social structure; experience  Probabilistic processes underlie language change: collective experience and eventual consensus
  • 21. Example: Photogram (Getty Art & Architecture Thesaurus) 21
  • 22. The challenges of decreasing accessibility
  • 23.  Unfamiliar data − Technical encoding – well-understood problems − Challenges of internationalisation  Unfamiliar texts − Conventions and best practices change over time − Coherence degrades long before it fails entirely (slower to read: takes more effort: machines trained on modern texts are likely to encounter issues with texts outside that timeframe)  Challenges of unfamiliar artefacts − There are many more questions that may be asked about an object: for example, in the case of artworks, “artist's intent” may be significant − Once lost, these are very difficult to infer Understanding unfamiliar material 23
  • 24.  Understanding unfamiliar material, though hard, is easier than finding it  Separate processes: − Recognising a term − Identifying (generating) a term  Recognition is faster and more reliable  Why: − Recognising a term: connecting term to concept − Generating terms: search around a concept looking through large pool of candidate terms for the one that might work best here − Think yourself into the curator's shoes: what terms might they have used for the concept that interests you, and why? Term recognition vs generation 24
  • 26.  Peirce: semiotic triad, relating symbol, object and interpreter − Software agents: machine-level features (machine perception) – words found in documents, colours, shapes or patterns found in images… − Human agents: perception; comprehension; application of relevant knowledge; interpretation into a set of concepts; encoding observations into terms  Observing the behaviour of human agents throughout the lifecycle of the digital object allows us to study change in manual interpretation and encoding  This permits us to characterise these patterns of change  It also permits software agents to be brought into line with changing norms Relating concept, feature, agent and term 26
  • 28.  PERICLES combines − model-led approaches to data management − data-led approaches to modelling and characterising the changing environment and context(s) of reuse  Approach acknowledges dynamical nature of system in which reuse occurs  Downside: such an approach requires ongoing availability of material (ethically) gleaned from observational data − Consequentially, a closed archive or an archive that excites little interest remains difficult to sustain, unless data is sourced elsewhere  In conclusion, therefore, data-led approaches gain from joint infrastructure and open data Conclusion