SlideShare ist ein Scribd-Unternehmen logo
1 von 28
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3
Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics
[Digital Preservation]
“This project has received funding from the European Union’s Seventh
Framework Programme for research, technological development and
demonstration under grant agreement no601138”.
Semi-automated metadata extraction
in the long term
Emma Tonkin, King’s College London
DPC Workshop, Belfast, Dec 2015
Structure of presentation
2
 Introduction to Pericles
 Layers of Metadata
 Sources of Metadata
 Time, space and data
 Semi-automated metadata as mitigating factor
Introduction to Pericles
Introduction to PERICLES
4
 Four-year Integrated Project (2013-2017) funded by the European Union
under its Seventh Framework Programme
 Promoting and Enhancing Reuse of Information throughout the Content
Lifecycle taking account of Evolving Semantics
 Two (or three) domains:
− Digital artworks, such as interactive software-based installations, and
other digital media from Tate's collections
− Material from Tate's archives
− Experimental scientific data originating from the European Space
Agency and International Space Station.
Model-driven approach
5
 Essentially all archives are based around some conceptual model of the
material held
 PERICLES applies formal models to describe
− Objects
− Entities associated with objects
− Broader community
 These models support processes such as appraisal and QA, and
consequentially functionality such as maintenance and actions taken for
sustainability
 A broad variety of models are under consideration: semantic (ontological)
models to formally describe objects; social network graphs to describe
community; statistical models to describe technology obsolescence...
Layers of Metadata
Open Archival Information System
7
 OAIS reference model
− “conceptual framework for an archival system dedicated to preserving
and maintaining access to digital information over the long term“
-Lavoie, B. (2000). Meeting the challenges of digital preservation: The OAIS reference model
 OAIS-compliance
− adherence to ISO 14721:2003 or (now) ISO 14721:2012
− Specifies conceptual framework, functional model, information model
OAIS information model
8
Descriptive metadata
9
 Supporting humans and machines
 Goal: interpreting data object
 Not always possible to automatically interpret data objects on any level
(some are fully opaque)
 Consider:
− 'Unstructured' natural-language texts, such as letters, books, articles
− Images of artworks
− Images of letters
− Recordings of audiovisual presentations
− Complex data files
Sources of descriptive metadata
Automated metadata extraction
11
 Popular view on indexing metadata:
− “the more, the merrier”
 Risks of low-quality metadata:
− Low accuracy on search and browse tasks; occasionally embarrassing
misinterpretations
 Benefits:
− Additional metadata can improve search indexing
How good is automated metadata
extraction?
12
 Varies significantly depending on the precise task and source material
 Automated metadata extraction tends to apply probabilistic (machine
learning) or heuristic approaches
 Machine-eye view:
− describe what is present
− Infer what is not based on:
 Knowledge base
 Comparison with other items
 Learning from training examples ('supervised learning')
Crowdsourcing metadata
13
 The 'phone a friend' approach to metadata generation
− Make material available to public
− Encourage them to annotate (example: social tagging)
− Examine the result
 The likely result:
− Some material extensively annotated; some descriptive annotations;
some formally structured; some personal ('cryptic')
− Some/most material receives no notice and is not annotated at all
 Mitigation: engineer more consistent coverage through, for example,
gamification (see Galaxy Zoo)
 Identify incentives that encourage public to contribute
Capturing 'live' metadata
14
 If the environment is accessible at the time of creation:
− Technical 'live' metadata may be captured
− Within Pericles, this is referred to as 'significant environment
information'
− Example: steps in creation, time of creation, contextual relevance of
other files…
 Another sort of 'live' metadata emerges from observation of behaviour of
those engaging with the data
− Interaction with search/browse interfaces (cf. information scent)
− Satisfaction with results
− Patterns of sharing and reuse (information diffusion on social networks,
for example)
Time, space and data
Theoretical reach of information
16
Theoretical reach of information
17
Image source: S Korotkiy
 Receiving the signal is only the start
 Can we decode the signal?
− Technical decoding
− Practical comprehension
 Confounding factors in decoding metadata:
− Language
− Dialect
− Prerequisite knowledge
Practical reach of information
18
Language: space travel
19
Language change: Time travel
20
 Language may be viewed as a complex adaptive system (Beckner et al,
2007)
− Made up of many tiny parts - people talking, writing, gesturing
− Adaptive, because we change our behaviour based on past
interactions
− Many factors influence its development: biology of perception; social
structure; experience
 Probabilistic processes underlie language change: collective experience
and eventual consensus
Example: Photogram (Getty Art &
Architecture Thesaurus)
21
The challenges of decreasing
accessibility
 Unfamiliar data
− Technical encoding – well-understood problems
− Challenges of internationalisation
 Unfamiliar texts
− Conventions and best practices change over time
− Coherence degrades long before it fails entirely (slower to read: takes
more effort: machines trained on modern texts are likely to encounter
issues with texts outside that timeframe)
 Challenges of unfamiliar artefacts
− There are many more questions that may be asked about an object: for
example, in the case of artworks, “artist's intent” may be significant
− Once lost, these are very difficult to infer
Understanding unfamiliar material
23
 Understanding unfamiliar material, though hard, is easier than finding it
 Separate processes:
− Recognising a term
− Identifying (generating) a term
 Recognition is faster and more reliable
 Why:
− Recognising a term: connecting term to concept
− Generating terms: search around a concept looking through large pool
of candidate terms for the one that might work best here
− Think yourself into the curator's shoes: what terms might they have
used for the concept that interests you, and why?
Term recognition vs generation
24
Semi-automated metadata as
mitigating factor
 Peirce: semiotic triad, relating symbol, object and interpreter
− Software agents: machine-level features (machine perception) –
words found in documents, colours, shapes or patterns found in
images…
− Human agents: perception; comprehension; application of relevant
knowledge; interpretation into a set of concepts; encoding
observations into terms
 Observing the behaviour of human agents throughout the lifecycle of the
digital object allows us to study change in manual interpretation and
encoding
 This permits us to characterise these patterns of change
 It also permits software agents to be brought into line with changing norms
Relating concept, feature, agent and
term
26
Conclusion
 PERICLES combines
− model-led approaches to data management
− data-led approaches to modelling and characterising the changing
environment and context(s) of reuse
 Approach acknowledges dynamical nature of system in which reuse occurs
 Downside: such an approach requires ongoing availability of material
(ethically) gleaned from observational data
− Consequentially, a closed archive or an archive that excites little
interest remains difficult to sustain, unless data is sourced elsewhere
 In conclusion, therefore, data-led approaches gain from joint infrastructure
and open data
Conclusion

Weitere ähnliche Inhalte

Was ist angesagt?

Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!Jochen Hummel
 
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked KnowledgeSands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledgesandsfish
 
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...JISC KeepIt project
 
Knowledge Representation essay outline
Knowledge Representation essay outlineKnowledge Representation essay outline
Knowledge Representation essay outlinemondesser
 
Policy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protectionPolicy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protectionSven Lieber
 
Tomas Singliar
Tomas SingliarTomas Singliar
Tomas Singliarbutest
 
Hypertext System
Hypertext SystemHypertext System
Hypertext Systemayina_11
 
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...Dejan Kovachev
 
Internet Security for Beginners
Internet Security for Beginners Internet Security for Beginners
Internet Security for Beginners chee wai wong
 

Was ist angesagt? (11)

Getaneh Alemu
Getaneh AlemuGetaneh Alemu
Getaneh Alemu
 
Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!
 
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked KnowledgeSands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
 
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
 
Knowledge Representation essay outline
Knowledge Representation essay outlineKnowledge Representation essay outline
Knowledge Representation essay outline
 
Policy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protectionPolicy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protection
 
Tomas Singliar
Tomas SingliarTomas Singliar
Tomas Singliar
 
Hypertext System
Hypertext SystemHypertext System
Hypertext System
 
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...
DireWolf - Distributing and Migrating User Interfaces for Widget-based Web Ap...
 
Hypertext system
Hypertext systemHypertext system
Hypertext system
 
Internet Security for Beginners
Internet Security for Beginners Internet Security for Beginners
Internet Security for Beginners
 

Andere mochten auch

Preservation Metadata Initiatives and Standards
Preservation Metadata Initiatives and StandardsPreservation Metadata Initiatives and Standards
Preservation Metadata Initiatives and StandardsKEEP_project
 
The Reference Model for an Open Archival Information System (OAIS)
The Reference Model for an Open Archival Information System (OAIS)The Reference Model for an Open Archival Information System (OAIS)
The Reference Model for an Open Archival Information System (OAIS)Michael Day
 
20 Years, 20 Recipes - Happy Thanksgiving from Aristotle
20 Years, 20 Recipes -  Happy Thanksgiving from Aristotle20 Years, 20 Recipes -  Happy Thanksgiving from Aristotle
20 Years, 20 Recipes - Happy Thanksgiving from AristotleAristotle, Inc.
 
Aprovação do governo no município de São Paulo - Maio 2016
Aprovação do governo no município de São Paulo - Maio 2016Aprovação do governo no município de São Paulo - Maio 2016
Aprovação do governo no município de São Paulo - Maio 2016Miguel Rosario
 
Introduction to the Reference Model for an Open Archival Information System (...
Introduction to the Reference Model for an Open Archival Information System (...Introduction to the Reference Model for an Open Archival Information System (...
Introduction to the Reference Model for an Open Archival Information System (...Michael Day
 

Andere mochten auch (6)

Oais
OaisOais
Oais
 
Preservation Metadata Initiatives and Standards
Preservation Metadata Initiatives and StandardsPreservation Metadata Initiatives and Standards
Preservation Metadata Initiatives and Standards
 
The Reference Model for an Open Archival Information System (OAIS)
The Reference Model for an Open Archival Information System (OAIS)The Reference Model for an Open Archival Information System (OAIS)
The Reference Model for an Open Archival Information System (OAIS)
 
20 Years, 20 Recipes - Happy Thanksgiving from Aristotle
20 Years, 20 Recipes -  Happy Thanksgiving from Aristotle20 Years, 20 Recipes -  Happy Thanksgiving from Aristotle
20 Years, 20 Recipes - Happy Thanksgiving from Aristotle
 
Aprovação do governo no município de São Paulo - Maio 2016
Aprovação do governo no município de São Paulo - Maio 2016Aprovação do governo no município de São Paulo - Maio 2016
Aprovação do governo no município de São Paulo - Maio 2016
 
Introduction to the Reference Model for an Open Archival Information System (...
Introduction to the Reference Model for an Open Archival Information System (...Introduction to the Reference Model for an Open Archival Information System (...
Introduction to the Reference Model for an Open Archival Information System (...
 

Ähnlich wie Semi-automated metadata extraction in the long-term

Hans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital PreservationHans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital PreservationNational Digital Forum
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...tobold
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...Sebastian Dennerlein
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPCGenoveva Vargas-Solar
 
Context culture metadata_openscout20120301
Context culture metadata_openscout20120301Context culture metadata_openscout20120301
Context culture metadata_openscout20120301Jan Pawlowski
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring reportpathsproject
 
Digital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigitalPreservationEurope
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Sandro D'Elia
 
Capitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger DataCapitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger DataShalin Hai-Jew
 
Web analytics webinar
Web analytics webinarWeb analytics webinar
Web analytics webinarJim Jansen
 
Web analytics presentation
Web analytics presentationWeb analytics presentation
Web analytics presentationJim Jansen
 

Ähnlich wie Semi-automated metadata extraction in the long-term (20)

Hans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital PreservationHans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital Preservation
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
Trm Introduction
Trm IntroductionTrm Introduction
Trm Introduction
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
 
20130222 kaptur training_goldsmiths
20130222 kaptur training_goldsmiths20130222 kaptur training_goldsmiths
20130222 kaptur training_goldsmiths
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPC
 
Trm Training Overview Planets
Trm Training Overview PlanetsTrm Training Overview Planets
Trm Training Overview Planets
 
Context culture metadata_openscout20120301
Context culture metadata_openscout20120301Context culture metadata_openscout20120301
Context culture metadata_openscout20120301
 
Research Data MANTRA
Research Data MANTRAResearch Data MANTRA
Research Data MANTRA
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring report
 
Digital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and Requirements
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708
 
Capitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger DataCapitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger Data
 
Web analytics webinar
Web analytics webinarWeb analytics webinar
Web analytics webinar
 
Research Data Mantra - March 2011
Research Data Mantra - March 2011Research Data Mantra - March 2011
Research Data Mantra - March 2011
 
Web analytics presentation
Web analytics presentationWeb analytics presentation
Web analytics presentation
 

Mehr von PERICLES_FP7

Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17PERICLES_FP7
 
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...PERICLES_FP7
 
Technical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopTechnical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopPERICLES_FP7
 
ForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelPERICLES_FP7
 
Data quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectiveData quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectivePERICLES_FP7
 
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...PERICLES_FP7
 
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...PERICLES_FP7
 
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016PERICLES_FP7
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangePERICLES_FP7
 
Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...PERICLES_FP7
 
Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016PERICLES_FP7
 
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES_FP7
 
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...PERICLES_FP7
 
Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016PERICLES_FP7
 
Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016PERICLES_FP7
 
Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...PERICLES_FP7
 
PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES_FP7
 
PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES_FP7
 
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES_FP7
 
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES_FP7
 

Mehr von PERICLES_FP7 (20)

Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17
 
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
 
Technical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopTechnical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshop
 
ForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information Model
 
Data quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectiveData quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspective
 
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
 
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
 
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on Change
 
Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...
 
Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016
 
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
 
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
 
Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016
 
Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016
 
Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...
 
PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...
 
PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016
 
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
 
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
 

Kürzlich hochgeladen

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Kürzlich hochgeladen (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Semi-automated metadata extraction in the long-term

  • 1. GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] “This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”. Semi-automated metadata extraction in the long term Emma Tonkin, King’s College London DPC Workshop, Belfast, Dec 2015
  • 2. Structure of presentation 2  Introduction to Pericles  Layers of Metadata  Sources of Metadata  Time, space and data  Semi-automated metadata as mitigating factor
  • 4. Introduction to PERICLES 4  Four-year Integrated Project (2013-2017) funded by the European Union under its Seventh Framework Programme  Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics  Two (or three) domains: − Digital artworks, such as interactive software-based installations, and other digital media from Tate's collections − Material from Tate's archives − Experimental scientific data originating from the European Space Agency and International Space Station.
  • 5. Model-driven approach 5  Essentially all archives are based around some conceptual model of the material held  PERICLES applies formal models to describe − Objects − Entities associated with objects − Broader community  These models support processes such as appraisal and QA, and consequentially functionality such as maintenance and actions taken for sustainability  A broad variety of models are under consideration: semantic (ontological) models to formally describe objects; social network graphs to describe community; statistical models to describe technology obsolescence...
  • 7. Open Archival Information System 7  OAIS reference model − “conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over the long term“ -Lavoie, B. (2000). Meeting the challenges of digital preservation: The OAIS reference model  OAIS-compliance − adherence to ISO 14721:2003 or (now) ISO 14721:2012 − Specifies conceptual framework, functional model, information model
  • 9. Descriptive metadata 9  Supporting humans and machines  Goal: interpreting data object  Not always possible to automatically interpret data objects on any level (some are fully opaque)  Consider: − 'Unstructured' natural-language texts, such as letters, books, articles − Images of artworks − Images of letters − Recordings of audiovisual presentations − Complex data files
  • 11. Automated metadata extraction 11  Popular view on indexing metadata: − “the more, the merrier”  Risks of low-quality metadata: − Low accuracy on search and browse tasks; occasionally embarrassing misinterpretations  Benefits: − Additional metadata can improve search indexing
  • 12. How good is automated metadata extraction? 12  Varies significantly depending on the precise task and source material  Automated metadata extraction tends to apply probabilistic (machine learning) or heuristic approaches  Machine-eye view: − describe what is present − Infer what is not based on:  Knowledge base  Comparison with other items  Learning from training examples ('supervised learning')
  • 13. Crowdsourcing metadata 13  The 'phone a friend' approach to metadata generation − Make material available to public − Encourage them to annotate (example: social tagging) − Examine the result  The likely result: − Some material extensively annotated; some descriptive annotations; some formally structured; some personal ('cryptic') − Some/most material receives no notice and is not annotated at all  Mitigation: engineer more consistent coverage through, for example, gamification (see Galaxy Zoo)  Identify incentives that encourage public to contribute
  • 14. Capturing 'live' metadata 14  If the environment is accessible at the time of creation: − Technical 'live' metadata may be captured − Within Pericles, this is referred to as 'significant environment information' − Example: steps in creation, time of creation, contextual relevance of other files…  Another sort of 'live' metadata emerges from observation of behaviour of those engaging with the data − Interaction with search/browse interfaces (cf. information scent) − Satisfaction with results − Patterns of sharing and reuse (information diffusion on social networks, for example)
  • 16. Theoretical reach of information 16
  • 17. Theoretical reach of information 17 Image source: S Korotkiy
  • 18.  Receiving the signal is only the start  Can we decode the signal? − Technical decoding − Practical comprehension  Confounding factors in decoding metadata: − Language − Dialect − Prerequisite knowledge Practical reach of information 18
  • 20. Language change: Time travel 20  Language may be viewed as a complex adaptive system (Beckner et al, 2007) − Made up of many tiny parts - people talking, writing, gesturing − Adaptive, because we change our behaviour based on past interactions − Many factors influence its development: biology of perception; social structure; experience  Probabilistic processes underlie language change: collective experience and eventual consensus
  • 21. Example: Photogram (Getty Art & Architecture Thesaurus) 21
  • 22. The challenges of decreasing accessibility
  • 23.  Unfamiliar data − Technical encoding – well-understood problems − Challenges of internationalisation  Unfamiliar texts − Conventions and best practices change over time − Coherence degrades long before it fails entirely (slower to read: takes more effort: machines trained on modern texts are likely to encounter issues with texts outside that timeframe)  Challenges of unfamiliar artefacts − There are many more questions that may be asked about an object: for example, in the case of artworks, “artist's intent” may be significant − Once lost, these are very difficult to infer Understanding unfamiliar material 23
  • 24.  Understanding unfamiliar material, though hard, is easier than finding it  Separate processes: − Recognising a term − Identifying (generating) a term  Recognition is faster and more reliable  Why: − Recognising a term: connecting term to concept − Generating terms: search around a concept looking through large pool of candidate terms for the one that might work best here − Think yourself into the curator's shoes: what terms might they have used for the concept that interests you, and why? Term recognition vs generation 24
  • 26.  Peirce: semiotic triad, relating symbol, object and interpreter − Software agents: machine-level features (machine perception) – words found in documents, colours, shapes or patterns found in images… − Human agents: perception; comprehension; application of relevant knowledge; interpretation into a set of concepts; encoding observations into terms  Observing the behaviour of human agents throughout the lifecycle of the digital object allows us to study change in manual interpretation and encoding  This permits us to characterise these patterns of change  It also permits software agents to be brought into line with changing norms Relating concept, feature, agent and term 26
  • 28.  PERICLES combines − model-led approaches to data management − data-led approaches to modelling and characterising the changing environment and context(s) of reuse  Approach acknowledges dynamical nature of system in which reuse occurs  Downside: such an approach requires ongoing availability of material (ethically) gleaned from observational data − Consequentially, a closed archive or an archive that excites little interest remains difficult to sustain, unless data is sourced elsewhere  In conclusion, therefore, data-led approaches gain from joint infrastructure and open data Conclusion