SlideShare a Scribd company logo
1 of 51
PRELIDA Project
Sotiris Batsakis & Grigoris Antoniou
University of Huddersfield 1
Overview
1. Background
1. Digital Preservation
2. Linked Data
2. Use Case
1. DBpedia
3. Identified Gap
4. Roadmap
1. Dataset Ingestion
2. Dealing with Changes
5. Recommendations
University of Huddersfield 2
1.1Digital Preservation introduction
 “Activities ensuring access to digital objects (data and
software) as long as it is required. In addition to that,
preserved content must be authenticated and
rendered properly upon request.”
 Problems
 File format obsolescence
 Storage medium failure
 Value and function of the digital object cannot be determined
anymore
 Simple loss of the digital objects
University of Huddersfield 3
Preservation Recommendations
 Using file formats based on open standards
 Using the services of digital archives to store the
objects for the long-term
 Creating and maintaining high quality documentation
specifically developed to create preservation metadata
so in the future the digital objects can be reused
 Making use of multiple storage facilities to reduce the
risk that the objects get lost.
University of Huddersfield 4
OAIS reference model
 OAIS reference model (Reference Model for an Open
Archival Information System) establishes a common
framework of terms and concepts relevant for the long
term archiving of digital data.
 The model details the processes around and inside of
the archive, including the interaction with the user.
But, it does not make any statements about which data
would need to be preserved.
 The OAIS reference model has been adopted as ISO
standard 14721
University of Huddersfield 5
OAIS definitions
 OAIS is defined as an archive and an organisation of
people and systems that has accepted the
responsibility to preserve information and make it
available for a “Designated Community”
 A Designated Community is defined as “an identified
group of potential consumers who should be able to
understand a particular set of information. The
Designated Community may be composed of multiple
user communities”.
University of Huddersfield 6
OAIS Archival Information Package
University of Huddersfield 7
1.2 Linked Data
 Use the Web as a platform to publish and re-use
identifiers that refer to data,
 Use a standard data model for expressing the data
(RDF).
University of Huddersfield 8
Publishing RDF Data
 As annotation to Web documents
 RDF data is included within the HTML code of Web pages.
 Software with suitable parsers can then extract the RDF
content for the pages.
 As Web documents
 RDF data is serialized and stored on the Web.
 RDF documents are served next to HTML documents and a
machine can request specific type of documents.
 As a database
 RDF can be stored in optimised graph databases (“triple
store”) and queried using the SPARQL query language.
University of Huddersfield 9
Publishing RDF data example
University of Huddersfield 10
 DBpedia publishing:
 As annotations through the RDFa markup present in the HTML
page
 As RDF content via content-negotiation with the resource
 With a SPARQL query sent to the end point
Preservation of Linked Data
 Web Data can be preserved just like any web page,
especially if there is structured data embedded in it (RDFa,
Microdata …).
 It is possible to extract structured data from any Web page
that contains annotations in order to expose it to the user via
various serialisation formats.
 Database Data can be preserved just like any database. RDF
is to be considered as the raw bits of information which are
serialised in RDF/XML (among others).
 The preservation of such files is similar to what would be
done for relational databases with the goal of providing data
consumers with a serialisation format that can be consumed
with current software.
University of Huddersfield 11
Semantics
 The archiving of a Web document consists of its own
text and other Web resources that are embedded in it.
This view differs for a Web of resources where the links
between the resources matter and evolve in time.
 On a global graph interconnecting several data sources
through shared conceptualization, this context is
infinite. The only way to preserve the Web of Data in a
meaningful way would be to snapshot it entirely, a
scenario that is intractable from an architectural point
of view.
University of Huddersfield 12
2 DBpedia introduction
 DBpedia's objective is to extract structured knowledge
from Wikipedia and make it freely available on the Web
using Semantic Web and Linked Data technologies.
 Data is extracted in RDF format
 Can be retrieved
 Directly (RDF)
 Through a SPARQL end-point
 As Web pages.
 Knowledge from different language editions of Wikipedia
is extracted along with links to other Linked Open Data
datasets.
University of Huddersfield 13
LOD cloud
University of Huddersfield 14
DBpedia data extraction
University of Huddersfield 15
 Wikipedia content is available using Creative Commons Attribution-
Sharealike 3.0 Unported License (CC-BY-SA) and the GNU Free
Documentation Licence (GFDL).
 DBpedia content (data and metadata such as the DBpedia ontology) is
available to end users under the same terms and licences as the
Wikipedia content.
DBpedia preservation (1/2)
 DBpedia preserves different versions of the entire dataset
by means of DBpedia dumps corresponding to an
versioning mechanism
 Besides the versioned versions of DBpedia, DBpedia live
keeps track of changes in Wikipedia and extracts newly
changed information from Wikipedia infoboxes and text
into RDF format
 DBpedia live contains also metadata about the part of
Wikipedia text that the information was extracted, the user
created or modified corresponding data and the date of
creation or last modification. Incremental modifications of
DBpedia live are also archived
University of Huddersfield 16
DBpedia preservation (2/2)
 DBpedia dataset contains links to other datasets containing
both definitions and data (e.g., Geonames).
 There are currently more than 27 million links from DBpedia
to other datasets.
 DBpedia archiving mechanisms also preserve links to these
datasets but not their content.
 Preserved data is DBpedia content in RDF or tables (CSV)
format.
 Rendering and querying software is not part of the archive
although extraction software from Wikipedia infoboxes and
text used for the creation of DBpedia dataset is preserved at
GitHub.
University of Huddersfield 17
2.1 DBpedia Use Cases
 Based on possible interactions and user requests :
 Request of archived data in RDF of CSV format
 Request of rendered data in Web format
 Submitting SPARQL queries on the archived versions of
the data
 Time
 Point
 Interval
 External Sources involved
University of Huddersfield 18
Use Case 1-RDF Data archiving and retrieval
 DBpedia data (in RDF format, or Tables-CSV format)
is archived and the user requests specific data (or the
entire dataset) as it was at a specific date in the past.
 The preservation mechanism must be able to provide
the requested data in RDF (or Table) format.
 Timestamps specifying the interval that the data was
valid (i.e., dates that the data was created or modified
for the last time before the date specified into the user
request, and first modification or deletion date after that
date) are a desirable feature of the mechanism.
University of Huddersfield 19
Use Case 1
 Retrieving data for a specific time interval, e.g., 2010-
2012, instead of a specific date is a more complex case
since all versions of the data and their corresponding
validity intervals with respect to the request interval
must be returned.
 Currently complex requests involving intervals are not
handled by the DBpedia archiving mechanism. RDF
data containing links to other LOD for specific time
points can be retrieved, but the content of links to
external LOD is not preserved.
University of Huddersfield 20
Use case 2: Rendering data as Web page
 User requests the DBpedia data for a specific topic at a
given temporal point or interval presented in a Web
page format
 The preservation mechanism should be able to return
the data in RDF format and in case description is
modified during the given interval all corresponding
descriptions, the intervals that each one distinct
description was true, modification history, differences
between versions and editors should be returned as in
the first use case.
University of Huddersfield 21
Use case 2
 Rendering requested data as a Web page will introduce
the following problem: can the functionality of
external links be preserved and supported as well or
not?
 Currently rendering software is not part of the
preservation mechanism, although using a standard
representation (RDF) minimizes the risk of not been
able to render the data in the future.
University of Huddersfield 22
Use case 3: SPARQL Endpoint functionality
 Reconstruct the functionality of the DBpedia SPARQL
endpoint at a specific temporal point in the past.
 Specifying both the query and the time point (this use case
can be extended by supporting interval queries as in use
case 1 and 2 above) . There are different kinds of queries
that must be handled corresponding to different use cases:
a) Queries spanning into RDF data into DBpedia dataset only
b) Queries spanning into DBpedia dataset and datasets
directly connected to the DBpedia RDF dataset (e.g.,
Geonames)
c) Queries spanning into DBpedia data and to external
datasets connected indirectly with DBpedia (i.e., through
links to datasets of case b).
University of Huddersfield 23
Use case 3
 Currently SPARQL end-point functionality is not
directly preserved, i.e., the users must retrieve the data
and use their own SPARQL end-point to query them.
 Then, they will be able to issue queries of type (a)
above, but not queries of type (b) when the content of
external links is requested.
 Also requests of type (c) cannot be handled by the
current preservation mechanism
University of Huddersfield 24
3. LD characteristics
 Digital objects can be:
 static vs dynamic
 complex vs simple
 active vs passive
 rendered vs non-rendered
 Linked Data is dynamic, complex, passive and typically
non-rendered.
University of Huddersfield 25
LD is dynamic
 Do people need archived version of LOD datasets or
are the most up to date version only what is needed?
 Linked Data are usually dynamic, which isn't the case in
many preservation applications for other types of
objects, so older versions should be preserved. This is
the case for example of DBpedia where both older
versions and incremental changes are archived.
 Different statements may be made at any time and so
the “boundary” of the object under consideration
changes in time.
University of Huddersfield 26
LD is Complex
 Linked Data is typically about expressing statements
(facts) whose truth or falsity is grounded to the context
provided by all the other statements available at that
particular moment.
 Linked Data is a form of formal knowledge. As for
any kind of data or information, the problem for long-
term preservation is not the preservation of an object
as such, but the preservation of the meaning of the
object.
University of Huddersfield 27
LD rendering
 Typically Linked data are not rendered and adopt
standards such as RDF that are open, widely adopted
and well supported.
 This simplifies that preservation mechanism if the only
preservation objects are the data themselves.
 This is the case in the current DBpedia archiving mechanism.
 On the other hand if rendering is a requirement then
appropriate software and perhaps platform emulators
must be preserved in addition to data themselves.
University of Huddersfield 28
LD is passive
 The linked data is usually represented in the form of
statements or objects (typically RDF triples) which are not
applications.
 Besides preserving data, software that handles data should
be preserved in some cases.
 Rendering functionality or access method, such as a
SPARQL endpoint for archived DBpedia data is an example
case not handled by the existing archiving mechanism.
University of Huddersfield 29
Additional issues
 Linked Data is typically distributed and the
persistence the preserved objects depend on all the
individual parts and the ontologies/vocabularies with
which the data is expressed.
 A lot of data is essentially dependent on OWL
ontologies that are created/maintained/hosted by others
 Also authenticity, uncertainty, web infrastructure
issues, accessibility in many ways, versioning and
preservation of metadata referring to triples
University of Huddersfield 30
4. Draft Roadmap
 The notion of designated communities only helps
partially
 By default we cannot know what communities in the
future might be interested in the archived material
 This may be even more sensitive in the case of Linked
Data, which are typically created with an idea of sharing
in mind, and as such tend to be less community specific
 Linked Data shifts away from the paradigm of self -
containedness often assumed by archives
University of Huddersfield 31
OAIS Compliance (1/2)
 An archived information package consists of content
and preservation description information.
 Content Information contains the data to be preserved
 Data Object is the data to be preserved
 Representation Information is information “needed to
make the Content Data Object understandable to the
Designated Community”
University of Huddersfield 32
OAIS Compliance (2/2)
 Preservation Description Information includes various
types of knowledge required (or rather,
recommended) for the preservation of the content
 Provenance
 Context
 Linked Data refer to other resources outside the graph
they belong, so that when archiving one graph, one
has to decide what to do with the links going out of the
graph
 An LD is not an OAIS, it is, at best, just the content part
of an OAIS
University of Huddersfield 33
4.1 Dataset Ingestion
 Ideally ingesting all dependencies
 Feasible for RDF and OWL vocabularies
 Defining boundary is a problem
 It exposes the archive to the risk of archiving a large portion of
the LOD cloud
 For all other kinds of resources, there exist the same
boundary problem, and in addition there is the problem
of digital rights management (DRM)
University of Huddersfield 34
Serialization
 Structure Information is given by (definition of) the
serialization format.
 There are many serializations (Unicode based)
 RDF 1.1 n-quads?
 PRELIDA will seek to establish a contact with the Data
Best Practices W3C group
University of Huddersfield 35
Dataset Description
 VoID vocabulary for RDF dataset metadata
 General metadata following the Dublin Core model.
 Access metadata describes how RDF data can be
accessed using various protocols.
 Structural metadata describes the structure and schema
of datasets and is useful for tasks such as querying and
data integration.
 Description of links between datasets are helpful for
understanding how multiple datasets are related and
can be used together.
University of Huddersfield 36
Data Catalogs
 DCAT is an RDF vocabulary designed to facilitate
interoperability between data catalogs published on
the Web
 Publishers increase discoverability
 Enable applications easily to consume metadata from
multiple catalogs
 Decentralized publishing of catalogs
 DCAT metadata can serve as a manifest file to facilitate
digital preservation
University of Huddersfield 37
Provenance Information
 The PROV vocabulary is recommended by the W3C
for expressing Provenance Information
 PROV is designed to represent how the RDF
 Was made
 What the history of this dataset is before and after
ingest
University of Huddersfield 38
Reasoner preservation
 Semantics of RDF/OWL
 RDF/OWL vocabulary the graph is built on
 Consider the preservation of related application
software
 OWL/RDF reasoners
 SPARQL endpoints
University of Huddersfield 39
4.2 Changes to the technology
 Several hardware and software components involved
and any of these components may malfunction or may
become obsolete
 It is the responsibility of the archive to monitor such
changes, and take actions (such as migration of the
data to a new format, W3C recommended, or to a new
medium)
 Standard preservation practices are adequate here.
W3C archiving recommendations (regarding format,
compatibility and data migration) will be required.
University of Huddersfield 40
Changes to the Content Data
 Data in the information system change, meaning that
some element is deleted, or updated, or that a new
data element is created
 The DBPedia LDD is continuously updated
 Existing mechanisms and policies are adequate for
internal data, but links to external datasets are not
handled by existing archives
 When the owner of the data decides that the changes
are significant enough, a new snapshot of the data is
taken. The archive’s responsibility is to possibly keep
track of the versioning relationships
University of Huddersfield 41
Changes to the Representation
Information
 Representation information may change, either because
the holding archive updates them or because some event
outside the holding archive requires a change to them.
 The serialization format of the preserved LDD becomes
obsolete, and the holding archive migrates the data to the
newly recommended format.
 The organization producing the data goes out of business,
and the responsibility of the data is transferred to a different
organization
 Recommendations for detecting changes (crawling
strategies or notification mechanism) must be defined.
University of Huddersfield 42
Changes to the vocabularies
 Vocabularies may change any time, due to the addition of
new terms (and of the involved axioms) or to the
deprecation of old terms.
 The data being preserved do not change directly, but the
change to the vocabularies may have an influence on their
semantics.
 This case is tackled by the joint action of the archive and
the data holder
 Since vocabularies are usually small in size, an aggressive
crawling strategy can be applied
 A set of centrally preserved core vocabularies is also an
option
University of Huddersfield 43
Changes to other web resources
 This case is tackled by the joint action of the archive
and the data holder, as in the case of external RDF
datasets and vocabularies
 This problem is similar to the Web archiving problem
and a complete technical solution for all cases is not
considered to be feasible. Nevertheless partial
solutions are feasible.
 Additional issues:
 Size
 DRM
University of Huddersfield 44
Changes to the knowledge base
 A piece of information is considered by the OAIS as
usable if it can be understood based on the knowledge
base of the Designated Community.
 Knowledge base:
 Distributed
 Various formats (text, images)
 Similar issues with Web archiving
University of Huddersfield 45
Dealing with changes
 Propagation of ontology change to the archived
descriptions that contain it.
 Lightweight approaches
 Add reference to some source explaining the difference
between the current and the previous notions
 Just indicate that there has been a change in the context
(vocabulary)
 Automated approaches
 Replace with new term
 Keep both old and new version
University of Huddersfield 46
5. Recommendations (1/3)
 Selection and appraisal of data:
 identify the boundaries of the LDD that has to be preserved
 Gather every RDF dataset that is relevant for the LDD to be
preserved
 Default strategy is complete closure. Both for vocabularies
(ontologies) and instances.
 Whenever an LD is collected into an archive, the owner of
the LD should be alerted that any change in that LD is
relevant for the collector, who is made part of a list of
subscribers that have to be notified of any change
University of Huddersfield 47
Recommendations (2/3)
 Submit data in a standard serialization (such as N-quads).
 Include VoID/DCAT/PROV description in Representation
Information
 Specification documents should be also preserved, (i.e.,
RDFS, OWL, serialization specs)
 Time-stamps for the crawls of the collected datasets
 Reasoners and SPARQL engines (triple store) are also to be
preserved for accessing purposes
 Submit every representation (HTML+RDFa, JSON) served
University of Huddersfield 48
Recommendations (3/3)
 The distributed nature of Linked Data suggests that a
distributed approach may be more appropriate than a
traditional one for preservation of LDD.
 In such an approach, an OAIS can be spread over several
archives, each storing a part of the Content Data.
 This distributed structure would be more suitable to
archive a LDD, whose references to external entities can
be managed as references to other OAIS managing those
entities.
University of Huddersfield 49
Conclusion
 Linked Data or Linked Open Data are a specific form of
digital objects. The problem for LOD lies not with the
notation of the data model (RDF triples).
 The differentiation between LOD living on the web, and
which main part are URIs pointing to web resources; and
LD living in a database like environment, which creates
most problems for archiving.
 Any attempt to archive LOD as part of the living web shares
problems to archive web resources.
 Issues and recommendation and best practices are outlined
here
 First step towards LD preservation roadmap
University of Huddersfield 50
Thank you
 Questions?
University of Huddersfield 51

More Related Content

What's hot

The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014Robert Meusel
 
Handout for Metadata for your Digital Collections
Handout for Metadata for your Digital CollectionsHandout for Metadata for your Digital Collections
Handout for Metadata for your Digital CollectionsJenn Riley
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesValeria Pesce
 
Performance Benchmarking of Key-Value Store NoSQL Databases
Performance Benchmarking of Key-Value Store NoSQL Databases Performance Benchmarking of Key-Value Store NoSQL Databases
Performance Benchmarking of Key-Value Store NoSQL Databases IJECEIAES
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata HarvestingNikesh Narayanan
 
The importance of metadata for datasets: The DCAT-AP European standard
The importance of metadata for datasets: The DCAT-AP European standardThe importance of metadata for datasets: The DCAT-AP European standard
The importance of metadata for datasets: The DCAT-AP European standardGiorgia Lodi
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyinscit2006
 
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)EUDAT
 
Building Collections in IRs from External Data Sources
Building Collections in IRs from External Data SourcesBuilding Collections in IRs from External Data Sources
Building Collections in IRs from External Data SourcesSusan Matveyeva
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesVyacheslav Tykhonov
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...New York University
 
Future of Cataloging, Classification
Future of Cataloging, ClassificationFuture of Cataloging, Classification
Future of Cataloging, ClassificationDenise Garofalo
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its servicesEUDAT
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data ApplicationsEUCLID project
 

What's hot (20)

The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
Handout for Metadata for your Digital Collections
Handout for Metadata for your Digital CollectionsHandout for Metadata for your Digital Collections
Handout for Metadata for your Digital Collections
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
Performance Benchmarking of Key-Value Store NoSQL Databases
Performance Benchmarking of Key-Value Store NoSQL Databases Performance Benchmarking of Key-Value Store NoSQL Databases
Performance Benchmarking of Key-Value Store NoSQL Databases
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
 
Metadata: A concept
Metadata: A conceptMetadata: A concept
Metadata: A concept
 
The importance of metadata for datasets: The DCAT-AP European standard
The importance of metadata for datasets: The DCAT-AP European standardThe importance of metadata for datasets: The DCAT-AP European standard
The importance of metadata for datasets: The DCAT-AP European standard
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case study
 
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
 
Building Collections in IRs from External Data Sources
Building Collections in IRs from External Data SourcesBuilding Collections in IRs from External Data Sources
Building Collections in IRs from External Data Sources
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
 
Future of Cataloging, Classification
Future of Cataloging, ClassificationFuture of Cataloging, Classification
Future of Cataloging, Classification
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its services
 
Distributed Databases Overview
Distributed Databases OverviewDistributed Databases Overview
Distributed Databases Overview
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Overview of dbms
Overview of dbmsOverview of dbms
Overview of dbms
 

Similar to PRELIDA Project Draft Roadmap

Wed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservationsWed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservationseswcsummerschool
 
MR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision ReflectionMR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision ReflectionTakeshi Morita
 
Wed garcia hands_on_d_bpedia preservation
Wed garcia hands_on_d_bpedia preservationWed garcia hands_on_d_bpedia preservation
Wed garcia hands_on_d_bpedia preservationeswcsummerschool
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015Cason Snow
 
RDTF Metadata Guidelines: an update
RDTF Metadata Guidelines: an updateRDTF Metadata Guidelines: an update
RDTF Metadata Guidelines: an updateAndy Powell
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataEDINA, University of Edinburgh
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Serverwebhostingguy
 
W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2nolmar01
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordJisc
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 

Similar to PRELIDA Project Draft Roadmap (20)

Wed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservationsWed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservations
 
Spotlight
SpotlightSpotlight
Spotlight
 
MR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision ReflectionMR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision Reflection
 
Wed garcia hands_on_d_bpedia preservation
Wed garcia hands_on_d_bpedia preservationWed garcia hands_on_d_bpedia preservation
Wed garcia hands_on_d_bpedia preservation
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BD
 
Going for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked MetadataGoing for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked Metadata
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
RDTF Metadata Guidelines: an update
RDTF Metadata Guidelines: an updateRDTF Metadata Guidelines: an update
RDTF Metadata Guidelines: an update
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial Metadata
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Server
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2
 
ECSA 2013 (Cuesta)
ECSA 2013 (Cuesta)ECSA 2013 (Cuesta)
ECSA 2013 (Cuesta)
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
Linked Data
Linked DataLinked Data
Linked Data
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
semanticweb
semanticwebsemanticweb
semanticweb
 

More from PRELIDA Project

Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value ChainPRELIDA Project
 
Preserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructurePreserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructurePRELIDA Project
 
Organizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data PreservationOrganizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data PreservationPRELIDA Project
 
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...PRELIDA Project
 
Experiments with evolving RDF
Experiments with evolving RDFExperiments with evolving RDF
Experiments with evolving RDFPRELIDA Project
 
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...PRELIDA Project
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyPRELIDA Project
 
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataPRELIDA Project
 
DIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for PreservationDIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for PreservationPRELIDA Project
 
DIACHRON Project Overview
DIACHRON Project OverviewDIACHRON Project Overview
DIACHRON Project OverviewPRELIDA Project
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationPRELIDA Project
 
Introduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination WorkshopIntroduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination WorkshopPRELIDA Project
 
D3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital PreservationD3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital PreservationPRELIDA Project
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectPRELIDA Project
 

More from PRELIDA Project (16)

Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value Chain
 
Preserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructurePreserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructure
 
Organizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data PreservationOrganizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data Preservation
 
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
 
Experiments with evolving RDF
Experiments with evolving RDFExperiments with evolving RDF
Experiments with evolving RDF
 
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
 
Media Ecology Project
Media Ecology ProjectMedia Ecology Project
Media Ecology Project
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
 
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
 
DIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for PreservationDIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for Preservation
 
DIACHRON Project Overview
DIACHRON Project OverviewDIACHRON Project Overview
DIACHRON Project Overview
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
 
Introduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination WorkshopIntroduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination Workshop
 
D3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital PreservationD3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital Preservation
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
 
Introduction to Prelida
Introduction to PrelidaIntroduction to Prelida
Introduction to Prelida
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

PRELIDA Project Draft Roadmap

  • 1. PRELIDA Project Sotiris Batsakis & Grigoris Antoniou University of Huddersfield 1
  • 2. Overview 1. Background 1. Digital Preservation 2. Linked Data 2. Use Case 1. DBpedia 3. Identified Gap 4. Roadmap 1. Dataset Ingestion 2. Dealing with Changes 5. Recommendations University of Huddersfield 2
  • 3. 1.1Digital Preservation introduction  “Activities ensuring access to digital objects (data and software) as long as it is required. In addition to that, preserved content must be authenticated and rendered properly upon request.”  Problems  File format obsolescence  Storage medium failure  Value and function of the digital object cannot be determined anymore  Simple loss of the digital objects University of Huddersfield 3
  • 4. Preservation Recommendations  Using file formats based on open standards  Using the services of digital archives to store the objects for the long-term  Creating and maintaining high quality documentation specifically developed to create preservation metadata so in the future the digital objects can be reused  Making use of multiple storage facilities to reduce the risk that the objects get lost. University of Huddersfield 4
  • 5. OAIS reference model  OAIS reference model (Reference Model for an Open Archival Information System) establishes a common framework of terms and concepts relevant for the long term archiving of digital data.  The model details the processes around and inside of the archive, including the interaction with the user. But, it does not make any statements about which data would need to be preserved.  The OAIS reference model has been adopted as ISO standard 14721 University of Huddersfield 5
  • 6. OAIS definitions  OAIS is defined as an archive and an organisation of people and systems that has accepted the responsibility to preserve information and make it available for a “Designated Community”  A Designated Community is defined as “an identified group of potential consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities”. University of Huddersfield 6
  • 7. OAIS Archival Information Package University of Huddersfield 7
  • 8. 1.2 Linked Data  Use the Web as a platform to publish and re-use identifiers that refer to data,  Use a standard data model for expressing the data (RDF). University of Huddersfield 8
  • 9. Publishing RDF Data  As annotation to Web documents  RDF data is included within the HTML code of Web pages.  Software with suitable parsers can then extract the RDF content for the pages.  As Web documents  RDF data is serialized and stored on the Web.  RDF documents are served next to HTML documents and a machine can request specific type of documents.  As a database  RDF can be stored in optimised graph databases (“triple store”) and queried using the SPARQL query language. University of Huddersfield 9
  • 10. Publishing RDF data example University of Huddersfield 10  DBpedia publishing:  As annotations through the RDFa markup present in the HTML page  As RDF content via content-negotiation with the resource  With a SPARQL query sent to the end point
  • 11. Preservation of Linked Data  Web Data can be preserved just like any web page, especially if there is structured data embedded in it (RDFa, Microdata …).  It is possible to extract structured data from any Web page that contains annotations in order to expose it to the user via various serialisation formats.  Database Data can be preserved just like any database. RDF is to be considered as the raw bits of information which are serialised in RDF/XML (among others).  The preservation of such files is similar to what would be done for relational databases with the goal of providing data consumers with a serialisation format that can be consumed with current software. University of Huddersfield 11
  • 12. Semantics  The archiving of a Web document consists of its own text and other Web resources that are embedded in it. This view differs for a Web of resources where the links between the resources matter and evolve in time.  On a global graph interconnecting several data sources through shared conceptualization, this context is infinite. The only way to preserve the Web of Data in a meaningful way would be to snapshot it entirely, a scenario that is intractable from an architectural point of view. University of Huddersfield 12
  • 13. 2 DBpedia introduction  DBpedia's objective is to extract structured knowledge from Wikipedia and make it freely available on the Web using Semantic Web and Linked Data technologies.  Data is extracted in RDF format  Can be retrieved  Directly (RDF)  Through a SPARQL end-point  As Web pages.  Knowledge from different language editions of Wikipedia is extracted along with links to other Linked Open Data datasets. University of Huddersfield 13
  • 14. LOD cloud University of Huddersfield 14
  • 15. DBpedia data extraction University of Huddersfield 15  Wikipedia content is available using Creative Commons Attribution- Sharealike 3.0 Unported License (CC-BY-SA) and the GNU Free Documentation Licence (GFDL).  DBpedia content (data and metadata such as the DBpedia ontology) is available to end users under the same terms and licences as the Wikipedia content.
  • 16. DBpedia preservation (1/2)  DBpedia preserves different versions of the entire dataset by means of DBpedia dumps corresponding to an versioning mechanism  Besides the versioned versions of DBpedia, DBpedia live keeps track of changes in Wikipedia and extracts newly changed information from Wikipedia infoboxes and text into RDF format  DBpedia live contains also metadata about the part of Wikipedia text that the information was extracted, the user created or modified corresponding data and the date of creation or last modification. Incremental modifications of DBpedia live are also archived University of Huddersfield 16
  • 17. DBpedia preservation (2/2)  DBpedia dataset contains links to other datasets containing both definitions and data (e.g., Geonames).  There are currently more than 27 million links from DBpedia to other datasets.  DBpedia archiving mechanisms also preserve links to these datasets but not their content.  Preserved data is DBpedia content in RDF or tables (CSV) format.  Rendering and querying software is not part of the archive although extraction software from Wikipedia infoboxes and text used for the creation of DBpedia dataset is preserved at GitHub. University of Huddersfield 17
  • 18. 2.1 DBpedia Use Cases  Based on possible interactions and user requests :  Request of archived data in RDF of CSV format  Request of rendered data in Web format  Submitting SPARQL queries on the archived versions of the data  Time  Point  Interval  External Sources involved University of Huddersfield 18
  • 19. Use Case 1-RDF Data archiving and retrieval  DBpedia data (in RDF format, or Tables-CSV format) is archived and the user requests specific data (or the entire dataset) as it was at a specific date in the past.  The preservation mechanism must be able to provide the requested data in RDF (or Table) format.  Timestamps specifying the interval that the data was valid (i.e., dates that the data was created or modified for the last time before the date specified into the user request, and first modification or deletion date after that date) are a desirable feature of the mechanism. University of Huddersfield 19
  • 20. Use Case 1  Retrieving data for a specific time interval, e.g., 2010- 2012, instead of a specific date is a more complex case since all versions of the data and their corresponding validity intervals with respect to the request interval must be returned.  Currently complex requests involving intervals are not handled by the DBpedia archiving mechanism. RDF data containing links to other LOD for specific time points can be retrieved, but the content of links to external LOD is not preserved. University of Huddersfield 20
  • 21. Use case 2: Rendering data as Web page  User requests the DBpedia data for a specific topic at a given temporal point or interval presented in a Web page format  The preservation mechanism should be able to return the data in RDF format and in case description is modified during the given interval all corresponding descriptions, the intervals that each one distinct description was true, modification history, differences between versions and editors should be returned as in the first use case. University of Huddersfield 21
  • 22. Use case 2  Rendering requested data as a Web page will introduce the following problem: can the functionality of external links be preserved and supported as well or not?  Currently rendering software is not part of the preservation mechanism, although using a standard representation (RDF) minimizes the risk of not been able to render the data in the future. University of Huddersfield 22
  • 23. Use case 3: SPARQL Endpoint functionality  Reconstruct the functionality of the DBpedia SPARQL endpoint at a specific temporal point in the past.  Specifying both the query and the time point (this use case can be extended by supporting interval queries as in use case 1 and 2 above) . There are different kinds of queries that must be handled corresponding to different use cases: a) Queries spanning into RDF data into DBpedia dataset only b) Queries spanning into DBpedia dataset and datasets directly connected to the DBpedia RDF dataset (e.g., Geonames) c) Queries spanning into DBpedia data and to external datasets connected indirectly with DBpedia (i.e., through links to datasets of case b). University of Huddersfield 23
  • 24. Use case 3  Currently SPARQL end-point functionality is not directly preserved, i.e., the users must retrieve the data and use their own SPARQL end-point to query them.  Then, they will be able to issue queries of type (a) above, but not queries of type (b) when the content of external links is requested.  Also requests of type (c) cannot be handled by the current preservation mechanism University of Huddersfield 24
  • 25. 3. LD characteristics  Digital objects can be:  static vs dynamic  complex vs simple  active vs passive  rendered vs non-rendered  Linked Data is dynamic, complex, passive and typically non-rendered. University of Huddersfield 25
  • 26. LD is dynamic  Do people need archived version of LOD datasets or are the most up to date version only what is needed?  Linked Data are usually dynamic, which isn't the case in many preservation applications for other types of objects, so older versions should be preserved. This is the case for example of DBpedia where both older versions and incremental changes are archived.  Different statements may be made at any time and so the “boundary” of the object under consideration changes in time. University of Huddersfield 26
  • 27. LD is Complex  Linked Data is typically about expressing statements (facts) whose truth or falsity is grounded to the context provided by all the other statements available at that particular moment.  Linked Data is a form of formal knowledge. As for any kind of data or information, the problem for long- term preservation is not the preservation of an object as such, but the preservation of the meaning of the object. University of Huddersfield 27
  • 28. LD rendering  Typically Linked data are not rendered and adopt standards such as RDF that are open, widely adopted and well supported.  This simplifies that preservation mechanism if the only preservation objects are the data themselves.  This is the case in the current DBpedia archiving mechanism.  On the other hand if rendering is a requirement then appropriate software and perhaps platform emulators must be preserved in addition to data themselves. University of Huddersfield 28
  • 29. LD is passive  The linked data is usually represented in the form of statements or objects (typically RDF triples) which are not applications.  Besides preserving data, software that handles data should be preserved in some cases.  Rendering functionality or access method, such as a SPARQL endpoint for archived DBpedia data is an example case not handled by the existing archiving mechanism. University of Huddersfield 29
  • 30. Additional issues  Linked Data is typically distributed and the persistence the preserved objects depend on all the individual parts and the ontologies/vocabularies with which the data is expressed.  A lot of data is essentially dependent on OWL ontologies that are created/maintained/hosted by others  Also authenticity, uncertainty, web infrastructure issues, accessibility in many ways, versioning and preservation of metadata referring to triples University of Huddersfield 30
  • 31. 4. Draft Roadmap  The notion of designated communities only helps partially  By default we cannot know what communities in the future might be interested in the archived material  This may be even more sensitive in the case of Linked Data, which are typically created with an idea of sharing in mind, and as such tend to be less community specific  Linked Data shifts away from the paradigm of self - containedness often assumed by archives University of Huddersfield 31
  • 32. OAIS Compliance (1/2)  An archived information package consists of content and preservation description information.  Content Information contains the data to be preserved  Data Object is the data to be preserved  Representation Information is information “needed to make the Content Data Object understandable to the Designated Community” University of Huddersfield 32
  • 33. OAIS Compliance (2/2)  Preservation Description Information includes various types of knowledge required (or rather, recommended) for the preservation of the content  Provenance  Context  Linked Data refer to other resources outside the graph they belong, so that when archiving one graph, one has to decide what to do with the links going out of the graph  An LD is not an OAIS, it is, at best, just the content part of an OAIS University of Huddersfield 33
  • 34. 4.1 Dataset Ingestion  Ideally ingesting all dependencies  Feasible for RDF and OWL vocabularies  Defining boundary is a problem  It exposes the archive to the risk of archiving a large portion of the LOD cloud  For all other kinds of resources, there exist the same boundary problem, and in addition there is the problem of digital rights management (DRM) University of Huddersfield 34
  • 35. Serialization  Structure Information is given by (definition of) the serialization format.  There are many serializations (Unicode based)  RDF 1.1 n-quads?  PRELIDA will seek to establish a contact with the Data Best Practices W3C group University of Huddersfield 35
  • 36. Dataset Description  VoID vocabulary for RDF dataset metadata  General metadata following the Dublin Core model.  Access metadata describes how RDF data can be accessed using various protocols.  Structural metadata describes the structure and schema of datasets and is useful for tasks such as querying and data integration.  Description of links between datasets are helpful for understanding how multiple datasets are related and can be used together. University of Huddersfield 36
  • 37. Data Catalogs  DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web  Publishers increase discoverability  Enable applications easily to consume metadata from multiple catalogs  Decentralized publishing of catalogs  DCAT metadata can serve as a manifest file to facilitate digital preservation University of Huddersfield 37
  • 38. Provenance Information  The PROV vocabulary is recommended by the W3C for expressing Provenance Information  PROV is designed to represent how the RDF  Was made  What the history of this dataset is before and after ingest University of Huddersfield 38
  • 39. Reasoner preservation  Semantics of RDF/OWL  RDF/OWL vocabulary the graph is built on  Consider the preservation of related application software  OWL/RDF reasoners  SPARQL endpoints University of Huddersfield 39
  • 40. 4.2 Changes to the technology  Several hardware and software components involved and any of these components may malfunction or may become obsolete  It is the responsibility of the archive to monitor such changes, and take actions (such as migration of the data to a new format, W3C recommended, or to a new medium)  Standard preservation practices are adequate here. W3C archiving recommendations (regarding format, compatibility and data migration) will be required. University of Huddersfield 40
  • 41. Changes to the Content Data  Data in the information system change, meaning that some element is deleted, or updated, or that a new data element is created  The DBPedia LDD is continuously updated  Existing mechanisms and policies are adequate for internal data, but links to external datasets are not handled by existing archives  When the owner of the data decides that the changes are significant enough, a new snapshot of the data is taken. The archive’s responsibility is to possibly keep track of the versioning relationships University of Huddersfield 41
  • 42. Changes to the Representation Information  Representation information may change, either because the holding archive updates them or because some event outside the holding archive requires a change to them.  The serialization format of the preserved LDD becomes obsolete, and the holding archive migrates the data to the newly recommended format.  The organization producing the data goes out of business, and the responsibility of the data is transferred to a different organization  Recommendations for detecting changes (crawling strategies or notification mechanism) must be defined. University of Huddersfield 42
  • 43. Changes to the vocabularies  Vocabularies may change any time, due to the addition of new terms (and of the involved axioms) or to the deprecation of old terms.  The data being preserved do not change directly, but the change to the vocabularies may have an influence on their semantics.  This case is tackled by the joint action of the archive and the data holder  Since vocabularies are usually small in size, an aggressive crawling strategy can be applied  A set of centrally preserved core vocabularies is also an option University of Huddersfield 43
  • 44. Changes to other web resources  This case is tackled by the joint action of the archive and the data holder, as in the case of external RDF datasets and vocabularies  This problem is similar to the Web archiving problem and a complete technical solution for all cases is not considered to be feasible. Nevertheless partial solutions are feasible.  Additional issues:  Size  DRM University of Huddersfield 44
  • 45. Changes to the knowledge base  A piece of information is considered by the OAIS as usable if it can be understood based on the knowledge base of the Designated Community.  Knowledge base:  Distributed  Various formats (text, images)  Similar issues with Web archiving University of Huddersfield 45
  • 46. Dealing with changes  Propagation of ontology change to the archived descriptions that contain it.  Lightweight approaches  Add reference to some source explaining the difference between the current and the previous notions  Just indicate that there has been a change in the context (vocabulary)  Automated approaches  Replace with new term  Keep both old and new version University of Huddersfield 46
  • 47. 5. Recommendations (1/3)  Selection and appraisal of data:  identify the boundaries of the LDD that has to be preserved  Gather every RDF dataset that is relevant for the LDD to be preserved  Default strategy is complete closure. Both for vocabularies (ontologies) and instances.  Whenever an LD is collected into an archive, the owner of the LD should be alerted that any change in that LD is relevant for the collector, who is made part of a list of subscribers that have to be notified of any change University of Huddersfield 47
  • 48. Recommendations (2/3)  Submit data in a standard serialization (such as N-quads).  Include VoID/DCAT/PROV description in Representation Information  Specification documents should be also preserved, (i.e., RDFS, OWL, serialization specs)  Time-stamps for the crawls of the collected datasets  Reasoners and SPARQL engines (triple store) are also to be preserved for accessing purposes  Submit every representation (HTML+RDFa, JSON) served University of Huddersfield 48
  • 49. Recommendations (3/3)  The distributed nature of Linked Data suggests that a distributed approach may be more appropriate than a traditional one for preservation of LDD.  In such an approach, an OAIS can be spread over several archives, each storing a part of the Content Data.  This distributed structure would be more suitable to archive a LDD, whose references to external entities can be managed as references to other OAIS managing those entities. University of Huddersfield 49
  • 50. Conclusion  Linked Data or Linked Open Data are a specific form of digital objects. The problem for LOD lies not with the notation of the data model (RDF triples).  The differentiation between LOD living on the web, and which main part are URIs pointing to web resources; and LD living in a database like environment, which creates most problems for archiving.  Any attempt to archive LOD as part of the living web shares problems to archive web resources.  Issues and recommendation and best practices are outlined here  First step towards LD preservation roadmap University of Huddersfield 50