SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Personalised
access to
cultural heritage
spaces

Roadmap from ESEPaths to EDMPaths:
a note on representing annotations resulting from
automatic
enrichment
Authors:

explore
!
paths
!

www.paths-project.eu!

search!

Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine
Isaac
Roadmap from ESEPaths to EDMPaths:
a note on representing annotations resulting from automatic
enrichment
Aitor Soroa, Eneko Agirre, Arantxa Otegi, Antoine Isaac
February 10, 2014

Contents
1 Introduction

1

2 ESEPaths

2

3 Roadmap for basic conversion of ESEPaths to EDM

4

4 Using Open Annotation to represent attributes in relations
4.1 Offsets and selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9
10

5 Conclusion

11

1

Introduction

This document is a case study on using the Europeana Data Model (EDM) [Doerr et al., 2010]1
for representing annotations of Cultural Heritage Objects (CHO). One of the main goals of
the PATHS project is to augment CHOs (items) with information that will enrich the user’s
experience. The additional information includes links between items in cultural collections
and from items to external sources like Wikipedia. With this goal, the PATHS project has
applied Natural Language Processing (NLP) techniques on a subset of the items in Europeana. Using these techniques, PATHS enriches CH items with the following information
[Agirre and de Lacalle, 2011, Otegi et al., 2012]:
• Informativeness score: each item is associated to a value indicating the overall “informativeness” of the item, which is derived from the amount of text in its metadata and
inversely proportional to the number of items where the same text is mentioned.
• Vocabulary terms: vocabulary terms associated to the item. These terms are used for
creating the tag clouds shown to the user.
1

http://pro.europeana.eu/edm-documentation

1
Roadmap from ESEPaths to EDMPaths:
a note on representing annotations resulting from automatic
enrichment
Aitor Soroa, Eneko Agirre, Arantxa Otegi, Antoine Isaac
February 10, 2014

Contents
1 Introduction

1

2 ESEPaths

2

3 Roadmap for basic conversion of ESEPaths to EDM

4

4 Using Open Annotation to represent attributes in relations
4.1 Offsets and selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9
10

5 Conclusion

11

1

Introduction

This document is a case study on using the Europeana Data Model (EDM) [Doerr et al., 2010]1
for representing annotations of Cultural Heritage Objects (CHO). One of the main goals of
the PATHS project is to augment CHOs (items) with information that will enrich the user’s
experience. The additional information includes links between items in cultural collections
and from items to external sources like Wikipedia. With this goal, the PATHS project has
applied Natural Language Processing (NLP) techniques on a subset of the items in Europeana. Using these techniques, PATHS enriches CH items with the following information
[Agirre and de Lacalle, 2011, Otegi et al., 2012]:
• Informativeness score: each item is associated to a value indicating the overall “informativeness” of the item, which is derived from the amount of text in its metadata and
inversely proportional to the number of items where the same text is mentioned.
• Vocabulary terms: vocabulary terms associated to the item. These terms are used for
creating the tag clouds shown to the user.
1

http://pro.europeana.eu/edm-documentation

1
• Event information associated with the item: CHOs often provide event- or activity-related
information, such as people walking, etc. We enrich the items by means of a predefined
list of words that can be used to refer to events. This data allows answering questions
like “give me items with people running”, “items with people playing”, etc.
• Related items: CH items which are semantically related.
• Background links that relate CH items with external resources such as Wikipedia. When
linking a CH item with some external resource, we keep track of the original text snippet from which the association is derived. For instance, an item could be related to a
Wikipedia article because of some text snippet of the dc:description field. In such case
we store the reference to the field and offset as attributes.2 (note that in some cases
however there is little point in keeping the text, because the enrichment is done based on
a combination of metadata fields)
The PATHS project started in 2011, and it adopted the representation schema of choice
then, ESE3 . We extended it extended to a format called ESEPaths to represent the enrichment
information just mentioned [Agirre and de Lacalle, 2011, Otegi et al., 2012]. In this document
we describe a proposal for representing PATHS enrichments following EDM (Europeana Data
Model), the new data model used by Europeana.
The document is structured as follows. We first introduce ESEPaths (Section 2), then
the roadmap for a simple conversion to EDM (Section 3). Section 4 explains some possible
(advanced) solutions to the problems identified in Section 3. Finally the conclusions are drawn.

2

ESEPaths

PATHS has defined a format derived from ESE, called ESEPaths, which adds the enrichment
information described above. Specifically, ESEPaths adds the following fields:
• <paths:informativeness> with the informativeness score of the ESE record.
• <paths:vocabulary>, which links the ESE record with vocabulary terms. The element
has the following attributes:
– name: name of the external vocabulary.
– URI: the address (URI) of the specific category in the vocabulary.
– confidence: the confidence of the association.
• <paths:event> which links the ESE record with external events. The element has the
following attributes:
– source: the name of the external resource of the event (for instance, WordNet).
– canonical_form: the canonical word form of the annotated event.
– confidence: confidence of the association.
2

Keeping track of this information, for instance, for an interface showing those annotations, as it can emphasize the specific snippet and link it to the Wikipedia/dbpedia article when the user points to it.
3
http://www.europeana.eu/schemas/ese/

2
<record>
<!-- Existing ESE record -->
<dc:identifier>http://www.thebowesmuseum.org.uk/10432/</dc:identifier>
<europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri>
<dc:title>Stembridge Windmill, High Ham, Somerset</dc:title>
<dc:description>This is a random-coursed blue lias ...</dc:description>
<dcterms:isPartOf>Bowes Museum</dcterms:isPartOf>
<dc:subject>1670</dc:subject>
<dc:type>Image</dc:type>
<europeana:provider>CultureGrid</europeana:provider>
<europeana:isShownAt>http://www.thebowesmuseum.org.uk/10432/</europeana:isShownAt>
<europeana:hasObject>false</europeana:hasObject>
<europeana:country>uk</europeana:country>
<europeana:type>IMAGE</europeana:type>
<europeana:language>en</europeana:language>
<!-- ESEPaths augmentation -->
<!-- item informativeness -->
<paths:informativeness>0.7</paths:informativeness>
<!-- vocabulary mapping -->
<paths:vocabulary confidence="0.8" source="wikicat"
URI="http://en.wikipedia.org/wiki/Category:Tower_mills">
Tower Mills</paths:vocabulary>
<!-- events -->
<paths:event confidence="0.8" source="wordnet" canonical_form="play"
start_offset="120" end_offset="127" field="dc:description">
playing</paths:event>
<!-- related items -->
<paths:related_item confidence="0.8" field="dc:subject" field_no="1"
method="LDA">
http://www.europeana.eu/portal/record/09405t/A6F9A
</paths:related_item>
<!-- background links items -->
<paths:background_link source="wikipedia" start_offset="0" end_offset="11"
field="dc:subject" confidence="0.015"
method="wikipedia-miner-1.2.0"
title="Archaeology">
http://en.wikipedia.org/wiki/Archaeology
</paths:background_link>
</record>

Figure 1: Example of an ESEPaths record

3
• <paths:related_item> which links the ESE record with related CH items. The element
has the following attributes:
– confidence: confidence of the association.
– method: which method produced the association
– field: the name of the ESE field whose content suggests the similarity relation.
– field_no: the position of the ESE field described above (useful in case the ESE
records contains more than one field with the same name).
• <paths:background_link>: which links the ESE record with an item from an external
resource. The element has the following attributes:
– source: the name of the external resource.
– start_offset: the offset (in characters) within the field element where the text
anchor begins.
– end_offset: the offset (in characters) within the field element where the text anchor
ends.
– field: the field of the ESE record where the anchor for this relation is located.
– confidence: confidence of the association.
– method: which method produced the association.
– title: title of the URL which the background link points to.
– sentiment: polarity of the textual information included in the corresponding link.
It has fixed values, namely “pos” for positive results, “neg” for negative and “neu” for
neutral.
Figure 1 shows an example of a CH record in ESEPaths. The first lines are just a copy of
the original ESE record, whereas the new elements (in the paths namespace) are at the end.
Note that identifiers (incl. URIs) are not real, and shortened so that the listing fits on the page.

3

Roadmap for basic conversion of ESEPaths to EDM

As said before, all the data produced by the PATHS project is encoded following the ESE
format extended with new elements. However, Europeana is switching from ESE to a new data
model, EDM. The main difference between ESE and EDM is that the latter is more expressive
and based on Semantic Web and Linked Data technologies (RDF, ontologies). In this section,
we outline the main design we devise for switching from ESEPaths to EDM.
The main design criteria we have followed is the following:
1. All PATHS annotations should be properly represented using EDM.
2. It must be possible to retrieve particular PATHS annotations.
3. We should depart as less as possible from standard EDM.

4
The first criterion states that all PATHS annotations should be described using EDM. As
will be shown below, some annotation attributes are difficult to represent following EDM and,
as a consequence, a compromise has to be made between describing PATHS annotations in their
full richness and using proper EDM concepts and properties for representing them. The second
criterion states that the EDM representation has to respect the types of the PATHS annotations.
For instance, it has to be possible to retrieve all background links of a particular CH item (as
opposite as, say, its related items). Finally, the last criterion states that we should use widely
used EDM objects and properties as possible. In particular, the EDM representation should
use the set of elements described by Europeana’s instructions for providers4 , when possible.
We now describe the main steps to describe the PATHS annotations to EDM.

From ESEPaths to EDM
We start describing the resources which are already in Europeana. This includes an Europeana ore:Aggregation resource with information about the digital aggregation process itself
(provider, etc)5 .
<http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation;
edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;
edm:dataProvider "English Heritage - Viewfinder";
edm:provider "CultureGrid";
edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;
edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>;
edm:rights <http://www.europeana.eu/rights/rr-f/>.

Europeana also provides a proxy for the CHO, attached to this aggregation6 :
<http://data.europeana.eu/proxy/provider/09405/8F49> a ore:Proxy;
ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;
ore:proxyIn <http://data.europeana.eu/aggregation/provider/09405/8F49>;
# Original ESE data
dc:creator "Davies, J O";
dc:date "[2001]";
dc:title "Stembridge Windmill, High Ham, Somerset";
dc:description "This is a random-coursed blue lias ...".

We now describe the way to represent the enrichment annotations as provided by the PATHS
project. We encapsulate these annotations into a new ore:Aggregation. This aggregation
resource records a first set of enrichments created by the PATHS project over the original CH
object. It includes all relevant information like provider name, access rights, etc. as well as the
annotations referring to the whole CH object, as opposed to enrichment information extracted
from some subset of the CH object’s metadata.
4

http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders
The resource identifier of the aggregation used in the example is not real. The real one should be provided
by Europeana.
6
Note again that the resource identifier of the proxy used in the example is not real.
5

5
<http://www.paths-project.eu/aggregation/paths/09405/8F49> a ore:Aggregation
edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;
edm:provider "PATHS";
edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;
edm:rights <http://www.paths-project.eu/rights/rr-f/>;
# item informativeness
paths:informativeness "0.7".

There are some notes to be aware of:
• The isShownAt property points to the original record, as the PATHS project does not
store any information besides the proper enrichment of CH items.
• The edm:rights property refers to the annotated information (instead of the rights of
the original CH item).
• As said before, the paths:informativeness element pertains to the PATHS aggregation
resource because it refers to the CH object as a whole.
Finally, we create a proxy resource for the PATHS aggregation and describe the remaining
paths annotations within the scope (as properties) of this resource:
<http://www.paths-project.eu/proxy/paths/09405/8F49> a ore:Proxy;
ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;
ore:proxyIn <http://www.paths-project.eu/aggregation/paths/09405/8F49>
# vocabulary mapping
edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>;
# events
edm:isRelatedTo <http://www.paths-project.eu/event/playing>;
# related items
edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>;
# background links items
edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>.
# Or <http://dbpedia.org/resource/Archaeology>

Representing various types of enrichment. As shown in the example, the proxy resource
relates the CH item with external resources such as vocabulary concepts, events, related items
or objects from some external sources (such as Wikipedia or dbpedia). As all the associations
are described by means of the high-level edm:isRelatedTo property, it is necessary to properly
declare the types of the external objects related to the CH object. Otherwise, there would be
no way to discriminate among the different types of PATHS annotations (for instance, there
would be no way to specifically retrieve the vocabulary concepts related to a CH object). As a
first solution, we can include a separate description for the resources linked to the CH object
using SKOS7 .
Within PATHS we define the following types of external resources:
• Related CH items: are of type paths:RelatedItemConcept, which is in turn a subclass
of skos:Concept.

7

• Vocabulary concepts are of type skos:Concept.
http://www.w3.org/2004/02/skos

6
• Events are of type paths:EventConcept, a subclass of skos:Concept. It represents any
concept which refers to a (type of) event (such as “run”, “play”, etc).
• Background links are of type paths:BackgroundLinkConcept, a subclass of skos:Concept.
Note that these classes are meant to offer a way to discriminate among the different types
of annotations inside the PATHS project. The classes are therefore loosely defined, in the sense
that they do not describe the proper semantic type of the resources. For instance, PATHS
can relate a CH object with a dbpedia resource representing a place (New_York), a person
(Pablo_Picasso), etc. However, within the scope of the PATHS annotations, the only explicit
common type for all those resources can be inherited from their “background link” status.
Also note that at the time being, Europeana would not be able to perfectly ingest data
that uses such sub-classes, as they depart from the set of elements described by Europeana’s
instructions for providers8 . This would require Europeana to handle specialisations of EDM,
which is not precisely scheduled at the time of writing.
Based on the above, we also include the following statements in the example:
<http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept;
skos:prefLabel "Tower Mills"@en.
<http://www.paths-project.eu/event/playing> a paths:EventConcept;
skos:prefLabel "playing"@en.
<http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept.
<http://en.wikipedia.org/wiki/Archaeology> a paths:BackgroundLinkConcept;
skos:prefLabel "Archeology"@en.

along with the definitions of these new types:
paths:EventConcept a owl:Class ;
rdfs:subClassOf skos:Concept ;
rdfs:label "Event Concept"@en ;
skos:definition "A concept describing an Event"@en .
paths:RelatedItemConcept a owl:Class ;
rdfs:subClassOf skos:Concept ;
rdfs:label "Related Item Concept"@en ;
skos:definition "A concept describing a CH record"@en .
paths:BackgroundLinkConcept a owl:Class ;
rdfs:subClassOf skos:Concept ;
rdfs:label "Background Link Concept"@en ;
skos:definition "A concept describing an object from an
external source such as dbpedia"@en .

The above definitions can be put next to the annotation data, in a separate file directly
provided to Europeana or others, or even served over the Web in a Linked Data scenario. The
whole EDM representation for the item is shown in Figure 2.
8

http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders

7
<http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation;
edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;
edm:dataProvider "English Heritage - Viewfinder";
edm:provider "CultureGrid";
edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;
edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>;
edm:rights <http://www.europeana.eu/rights/rr-f/>.
<http://data.europeana.eu/proxy/europeana/09405/8F49> a ore:Proxy;
ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;
ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49>;
# Existing ESE record
dc:creator "Davies, J O";
dc:date "[2001]";
dc:title "Stembridge Windmill, High Ham, Somerset";
dc:description "This is a random-coursed blue lias ...".
<http://www.paths-project.eu/aggregation/europeana/09405/8F49> a ore:Aggregation;
edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;
edm:provider "PATHS";
edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;
edm:rights <http://www.paths-project.eu/rights/rr-f/>;
# item informativeness
paths:informativeness "0.7".
<http://www.paths-project.eu/proxy/europeana/09405/8F49> a ore:Proxy;
ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;
ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49>
# vocabulary mapping
edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>;
# events
edm:isRelatedTo <http://www.paths-project.eu/event/playing>;
# related items
edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>;
# background links items
edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>.
# Or <http://dbpedia.org/resource/Archaeology>
<http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept;
skos:prefLabel "Tower Mills"@en.
<http://www.paths-project.eu/event/playing> a paths:EventConcept;
skos:prefLabel "playing"@en.
<http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept.

Figure 2: EDM representation of the ESEPaths example

8
Using specific metadata fields to represent enrichments Alternatively, if a PATHS
enrichment is known to be certain, a new metadata field can be created for the CH object. For
instance if the mapping of the CH record to a vocabulary concept is known to be sure, we can
create a new dc:subject field linking the CH record with the appropriate vocabulary concept.
Note however that PATHS enrichments are automatically performed, and it is not certain that
a concept enrichment derived from a dc:subject would result in a dc:subject relation between
the object and the concept. The link to the concept may have been identified based on only a
small part of the original field, thus missing some of the original semantics. Thus some manual
assessment has to be done in order to promote the annotation into a proper metadata field.

4

Using Open Annotation to represent attributes in relations

The roadmap described in the previous section covers the main aspects of ESEPaths. However,
there is a first piece of ESEPaths data, which can not be easily represented in EDM as it inherits
RDF’s focus on binary relations: attributes on relations. Almost all annotations created by the
PATHS project have some information associated to them. Especially, many annotations record
a confidence value, describing the level of certainty of the automatic method when creating the
annotation.
A way to overcome this limitation in an RDF-based model would be to reify the annotation into an instance of a dedicated class, and represent the annotation attributes using class
properties. For this we can re-use elements from the Open Annotation (OA) model9 . Consider
this ESEPaths snippet:
<record>
...
<europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri>
<paths:background_link source="wikipedia" start_offset="0" end_offset="11"
field="dc:subject" confidence="0.015"
method="wikipedia-miner-1.2.0"
title="Archaeology">
http://en.wikipedia.org/wiki/Archaeology
</paths:background_link>
</record>

We would create the following oa:Annotation for it:
background_link1 a oa:Annotation ;
a paths:BackgroundLinkAnnotation ;
oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ;
oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ;
#Or <http://dbpedia.org/resource/Archaeology>
paths:source
<http://en.wikipedia.org> ;
#Or <dbpedia.org>
paths:confidence
"0.015" .

In the example, the <paths:background_link> annotation has been converted (reified) to
an oa:Annotation resource background_link_resource1 of type paths:BackgroundLinkAnnotation,
9

http://www.openannotation.org/spec/core/

9
linked by the oa:hasTarget relation to the PATHS proxy resource. The attributes of the original relation are now represented as properties of this new resource.
An alternative of the above approach would be using the OA “motivation” property for
representing the annotation. The OA motivation is meant to represent “the reasons why the
Annotation was created, not just the agents involved” 10 , which fits particularly well with the
kind of information we want to represent. The “motivation” approach would lead to the following
triplets:
background_link1 a oa:Annotation ;
oa:motivatedBy paths:backgroundLinkMotivation ;
oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ;
oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ;
#Or <http://dbpedia.org/resource/Archaeology>
paths:source
<http://en.wikipedia.org> ;
#Or <dbpedia.org>
paths:confidence
"0.015" .

In this case, the <paths:background_link> object is of type oa:Annotation, and it is also
oa:motivatedBy a paths:backgroundLinkMotivation, an instance of skos:Concept.
Both approaches described so far solve the main problem of attaching attributes to relations,
and also the need of defining specific relations for PATHS such as paths:background_link,
that would conflict with the metadata fields currently used by EDM. Note however that the
properties of the newly defined reified annotations are still specific for PATHS (paths:source,
paths:confidence, etc).
On a side note, using reified concepts for annotation raises the issue of whether we should
still keep the proxy-based representation next to it. Because now all the PATHS enrichment
data is attached to the reified annotation, the Proxy object described in Section 3 will convey
little or no information at all, compared to the original data.

4.1

Offsets and selectors

There is another piece of ESEPaths data, which is not currently represented in EDMPaths,
namely, the field and offset attributes of the relations. Because all PATHS annotations are
extracted from the textual content of some metadata field in the original CH record representation, ESEPaths annotations keeps track of the original text snippet (called the anchor ) which
was used to derive the enrichment.
In order to track this kind of provenance information, EDM could re-use the selectors from
the Open Annotation model11 . For instance, Consider the following ESEPaths snippet:
<record>
...
<europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri>
...
<paths:background_link start_offset="0" end_offset="11"
field="dc:subject" ... >
http://en.wikipedia.org/wiki/Archaeology
</paths:background_link>
</record>
10
11

http://www.openannotation.org/spec/core/core.html#Motivations
http://www.openannotation.org/spec/core/specific.html#Selectors

10
It describes an “background link” annotation for the CH object “09405/8F49” which was extracted by analyzing the offsets 0-11of the dc:subject of the original record. These offsets
could be translated to the following Open Annotation snippet:
background_link1 a oa:Annotation ;
oa:hasTarget anchor1 ;
oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> .
anchor1 a oa:SpecificResource ;
oa:hasSource ??? ; # which type has this object ?
oa:hasSelector selector1 .
selector1 a oa:TextPositionSelector ;
oa:start 0 ;
oa:end 11 .

As noted in the snippet, our problem is then to define the type of the anchor1 resource.
This object should represent the dc:subject field of CH record “09405/8F49”, but there is
actually no way to describe this with EDM. We thus decided to leave this piece of information
out of our proposed solution.

5

Conclusion

In this work we describe a method for representing automatically created PATHS annotations
into the EDM model. We first describe a simple way for representing the annotations and discuss
its benefits and drawbacks. One important weakness of the simple annotation schema lies in
its inability to represent attributes of annotations, such as confidence scores. To overcome this
limitation we propose a more complex solution that involves reifing the annotation properties as
instances of dedicated classes, and representing the annotation attributes using class properties.
For this we have re-used elements from the Open Annotation (OA) model.
The method presented here, called EDMPaths, is able to properly represent the annotations
following EDM, but some information which was previously present following ESE has been
left out. In particular, information regarding the particular offset of the anchor that caused the
annotation was produced has proven difficult to represent.
One of our main design goals has been to avoid creating new non-standard classes and
properties when defining EDMPaths. We think we have succeed on this particular aspect,
mainly by reusing elements from initiatives such as the Open Annotation model. However, the
proposal describes some properties which are still specific for the PATHS project.

References
[Agirre and de Lacalle, 2011] Agirre, E. and de Lacalle, O. L. (2011). D2.1: Processing and
representation of content for first prototype. Technical report, PATHS project.
[Doerr et al., 2010] Doerr, M., Gradmann, S., Hennicke, S., Isaac, A., Meghini, C., and van de
Sompel, H. (2010). The europeana data model (EDM). In World Library and Information
Congress: 76th IFLA general conference and assembly, pages 10–15.
11
[Otegi et al., 2012] Otegi, A., Agirre, E., and Soroa, A. (2012). D2.2: Processing and representation of content for second prototype. Technical report, PATHS project.

12
• Event information associated with the item: CHOs often provide event- or activity-related
information, such as people walking, etc. We enrich the items by means of a predefined
list of words that can be used to refer to events. This data allows answering questions
like “give me items with people running”, “items with people playing”, etc.
• Related items: CH items which are semantically related.
• Background links that relate CH items with external resources such as Wikipedia. When
linking a CH item with some external resource, we keep track of the original text snippet from which the association is derived. For instance, an item could be related to a
Wikipedia article because of some text snippet of the dc:description field. In such case
we store the reference to the field and offset as attributes.2 (note that in some cases
however there is little point in keeping the text, because the enrichment is done based on
a combination of metadata fields)
The PATHS project started in 2011, and it adopted the representation schema of choice
then, ESE3 . We extended it extended to a format called ESEPaths to represent the enrichment
information just mentioned [Agirre and de Lacalle, 2011, Otegi et al., 2012]. In this document
we describe a proposal for representing PATHS enrichments following EDM (Europeana Data
Model), the new data model used by Europeana.
The document is structured as follows. We first introduce ESEPaths (Section 2), then
the roadmap for a simple conversion to EDM (Section 3). Section 4 explains some possible
(advanced) solutions to the problems identified in Section 3. Finally the conclusions are drawn.

2

ESEPaths

PATHS has defined a format derived from ESE, called ESEPaths, which adds the enrichment
information described above. Specifically, ESEPaths adds the following fields:
• <paths:informativeness> with the informativeness score of the ESE record.
• <paths:vocabulary>, which links the ESE record with vocabulary terms. The element
has the following attributes:
– name: name of the external vocabulary.
– URI: the address (URI) of the specific category in the vocabulary.
– confidence: the confidence of the association.
• <paths:event> which links the ESE record with external events. The element has the
following attributes:
– source: the name of the external resource of the event (for instance, WordNet).
– canonical_form: the canonical word form of the annotated event.
– confidence: confidence of the association.
2

Keeping track of this information, for instance, for an interface showing those annotations, as it can emphasize the specific snippet and link it to the Wikipedia/dbpedia article when the user points to it.
3
http://www.europeana.eu/schemas/ese/

2
<record>
<!-- Existing ESE record -->
<dc:identifier>http://www.thebowesmuseum.org.uk/10432/</dc:identifier>
<europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri>
<dc:title>Stembridge Windmill, High Ham, Somerset</dc:title>
<dc:description>This is a random-coursed blue lias ...</dc:description>
<dcterms:isPartOf>Bowes Museum</dcterms:isPartOf>
<dc:subject>1670</dc:subject>
<dc:type>Image</dc:type>
<europeana:provider>CultureGrid</europeana:provider>
<europeana:isShownAt>http://www.thebowesmuseum.org.uk/10432/</europeana:isShownAt>
<europeana:hasObject>false</europeana:hasObject>
<europeana:country>uk</europeana:country>
<europeana:type>IMAGE</europeana:type>
<europeana:language>en</europeana:language>
<!-- ESEPaths augmentation -->
<!-- item informativeness -->
<paths:informativeness>0.7</paths:informativeness>
<!-- vocabulary mapping -->
<paths:vocabulary confidence="0.8" source="wikicat"
URI="http://en.wikipedia.org/wiki/Category:Tower_mills">
Tower Mills</paths:vocabulary>
<!-- events -->
<paths:event confidence="0.8" source="wordnet" canonical_form="play"
start_offset="120" end_offset="127" field="dc:description">
playing</paths:event>
<!-- related items -->
<paths:related_item confidence="0.8" field="dc:subject" field_no="1"
method="LDA">
http://www.europeana.eu/portal/record/09405t/A6F9A
</paths:related_item>
<!-- background links items -->
<paths:background_link source="wikipedia" start_offset="0" end_offset="11"
field="dc:subject" confidence="0.015"
method="wikipedia-miner-1.2.0"
title="Archaeology">
http://en.wikipedia.org/wiki/Archaeology
</paths:background_link>
</record>

Figure 1: Example of an ESEPaths record

3
• <paths:related_item> which links the ESE record with related CH items. The element
has the following attributes:
– confidence: confidence of the association.
– method: which method produced the association
– field: the name of the ESE field whose content suggests the similarity relation.
– field_no: the position of the ESE field described above (useful in case the ESE
records contains more than one field with the same name).
• <paths:background_link>: which links the ESE record with an item from an external
resource. The element has the following attributes:
– source: the name of the external resource.
– start_offset: the offset (in characters) within the field element where the text
anchor begins.
– end_offset: the offset (in characters) within the field element where the text anchor
ends.
– field: the field of the ESE record where the anchor for this relation is located.
– confidence: confidence of the association.
– method: which method produced the association.
– title: title of the URL which the background link points to.
– sentiment: polarity of the textual information included in the corresponding link.
It has fixed values, namely “pos” for positive results, “neg” for negative and “neu” for
neutral.
Figure 1 shows an example of a CH record in ESEPaths. The first lines are just a copy of
the original ESE record, whereas the new elements (in the paths namespace) are at the end.
Note that identifiers (incl. URIs) are not real, and shortened so that the listing fits on the page.

3

Roadmap for basic conversion of ESEPaths to EDM

As said before, all the data produced by the PATHS project is encoded following the ESE
format extended with new elements. However, Europeana is switching from ESE to a new data
model, EDM. The main difference between ESE and EDM is that the latter is more expressive
and based on Semantic Web and Linked Data technologies (RDF, ontologies). In this section,
we outline the main design we devise for switching from ESEPaths to EDM.
The main design criteria we have followed is the following:
1. All PATHS annotations should be properly represented using EDM.
2. It must be possible to retrieve particular PATHS annotations.
3. We should depart as less as possible from standard EDM.

4
The first criterion states that all PATHS annotations should be described using EDM. As
will be shown below, some annotation attributes are difficult to represent following EDM and,
as a consequence, a compromise has to be made between describing PATHS annotations in their
full richness and using proper EDM concepts and properties for representing them. The second
criterion states that the EDM representation has to respect the types of the PATHS annotations.
For instance, it has to be possible to retrieve all background links of a particular CH item (as
opposite as, say, its related items). Finally, the last criterion states that we should use widely
used EDM objects and properties as possible. In particular, the EDM representation should
use the set of elements described by Europeana’s instructions for providers4 , when possible.
We now describe the main steps to describe the PATHS annotations to EDM.

From ESEPaths to EDM
We start describing the resources which are already in Europeana. This includes an Europeana ore:Aggregation resource with information about the digital aggregation process itself
(provider, etc)5 .
<http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation;
edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;
edm:dataProvider "English Heritage - Viewfinder";
edm:provider "CultureGrid";
edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;
edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>;
edm:rights <http://www.europeana.eu/rights/rr-f/>.

Europeana also provides a proxy for the CHO, attached to this aggregation6 :
<http://data.europeana.eu/proxy/provider/09405/8F49> a ore:Proxy;
ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;
ore:proxyIn <http://data.europeana.eu/aggregation/provider/09405/8F49>;
# Original ESE data
dc:creator "Davies, J O";
dc:date "[2001]";
dc:title "Stembridge Windmill, High Ham, Somerset";
dc:description "This is a random-coursed blue lias ...".

We now describe the way to represent the enrichment annotations as provided by the PATHS
project. We encapsulate these annotations into a new ore:Aggregation. This aggregation
resource records a first set of enrichments created by the PATHS project over the original CH
object. It includes all relevant information like provider name, access rights, etc. as well as the
annotations referring to the whole CH object, as opposed to enrichment information extracted
from some subset of the CH object’s metadata.
4

http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders
The resource identifier of the aggregation used in the example is not real. The real one should be provided
by Europeana.
6
Note again that the resource identifier of the proxy used in the example is not real.
5

5
<http://www.paths-project.eu/aggregation/paths/09405/8F49> a ore:Aggregation
edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;
edm:provider "PATHS";
edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;
edm:rights <http://www.paths-project.eu/rights/rr-f/>;
# item informativeness
paths:informativeness "0.7".

There are some notes to be aware of:
• The isShownAt property points to the original record, as the PATHS project does not
store any information besides the proper enrichment of CH items.
• The edm:rights property refers to the annotated information (instead of the rights of
the original CH item).
• As said before, the paths:informativeness element pertains to the PATHS aggregation
resource because it refers to the CH object as a whole.
Finally, we create a proxy resource for the PATHS aggregation and describe the remaining
paths annotations within the scope (as properties) of this resource:
<http://www.paths-project.eu/proxy/paths/09405/8F49> a ore:Proxy;
ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;
ore:proxyIn <http://www.paths-project.eu/aggregation/paths/09405/8F49>
# vocabulary mapping
edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>;
# events
edm:isRelatedTo <http://www.paths-project.eu/event/playing>;
# related items
edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>;
# background links items
edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>.
# Or <http://dbpedia.org/resource/Archaeology>

Representing various types of enrichment. As shown in the example, the proxy resource
relates the CH item with external resources such as vocabulary concepts, events, related items
or objects from some external sources (such as Wikipedia or dbpedia). As all the associations
are described by means of the high-level edm:isRelatedTo property, it is necessary to properly
declare the types of the external objects related to the CH object. Otherwise, there would be
no way to discriminate among the different types of PATHS annotations (for instance, there
would be no way to specifically retrieve the vocabulary concepts related to a CH object). As a
first solution, we can include a separate description for the resources linked to the CH object
using SKOS7 .
Within PATHS we define the following types of external resources:
• Related CH items: are of type paths:RelatedItemConcept, which is in turn a subclass
of skos:Concept.

7

• Vocabulary concepts are of type skos:Concept.
http://www.w3.org/2004/02/skos

6
• Events are of type paths:EventConcept, a subclass of skos:Concept. It represents any
concept which refers to a (type of) event (such as “run”, “play”, etc).
• Background links are of type paths:BackgroundLinkConcept, a subclass of skos:Concept.
Note that these classes are meant to offer a way to discriminate among the different types
of annotations inside the PATHS project. The classes are therefore loosely defined, in the sense
that they do not describe the proper semantic type of the resources. For instance, PATHS
can relate a CH object with a dbpedia resource representing a place (New_York), a person
(Pablo_Picasso), etc. However, within the scope of the PATHS annotations, the only explicit
common type for all those resources can be inherited from their “background link” status.
Also note that at the time being, Europeana would not be able to perfectly ingest data
that uses such sub-classes, as they depart from the set of elements described by Europeana’s
instructions for providers8 . This would require Europeana to handle specialisations of EDM,
which is not precisely scheduled at the time of writing.
Based on the above, we also include the following statements in the example:
<http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept;
skos:prefLabel "Tower Mills"@en.
<http://www.paths-project.eu/event/playing> a paths:EventConcept;
skos:prefLabel "playing"@en.
<http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept.
<http://en.wikipedia.org/wiki/Archaeology> a paths:BackgroundLinkConcept;
skos:prefLabel "Archeology"@en.

along with the definitions of these new types:
paths:EventConcept a owl:Class ;
rdfs:subClassOf skos:Concept ;
rdfs:label "Event Concept"@en ;
skos:definition "A concept describing an Event"@en .
paths:RelatedItemConcept a owl:Class ;
rdfs:subClassOf skos:Concept ;
rdfs:label "Related Item Concept"@en ;
skos:definition "A concept describing a CH record"@en .
paths:BackgroundLinkConcept a owl:Class ;
rdfs:subClassOf skos:Concept ;
rdfs:label "Background Link Concept"@en ;
skos:definition "A concept describing an object from an
external source such as dbpedia"@en .

The above definitions can be put next to the annotation data, in a separate file directly
provided to Europeana or others, or even served over the Web in a Linked Data scenario. The
whole EDM representation for the item is shown in Figure 2.
8

http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders

7
<http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation;
edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;
edm:dataProvider "English Heritage - Viewfinder";
edm:provider "CultureGrid";
edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;
edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>;
edm:rights <http://www.europeana.eu/rights/rr-f/>.
<http://data.europeana.eu/proxy/europeana/09405/8F49> a ore:Proxy;
ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;
ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49>;
# Existing ESE record
dc:creator "Davies, J O";
dc:date "[2001]";
dc:title "Stembridge Windmill, High Ham, Somerset";
dc:description "This is a random-coursed blue lias ...".
<http://www.paths-project.eu/aggregation/europeana/09405/8F49> a ore:Aggregation;
edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;
edm:provider "PATHS";
edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;
edm:rights <http://www.paths-project.eu/rights/rr-f/>;
# item informativeness
paths:informativeness "0.7".
<http://www.paths-project.eu/proxy/europeana/09405/8F49> a ore:Proxy;
ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;
ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49>
# vocabulary mapping
edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>;
# events
edm:isRelatedTo <http://www.paths-project.eu/event/playing>;
# related items
edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>;
# background links items
edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>.
# Or <http://dbpedia.org/resource/Archaeology>
<http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept;
skos:prefLabel "Tower Mills"@en.
<http://www.paths-project.eu/event/playing> a paths:EventConcept;
skos:prefLabel "playing"@en.
<http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept.

Figure 2: EDM representation of the ESEPaths example

8
Using specific metadata fields to represent enrichments Alternatively, if a PATHS
enrichment is known to be certain, a new metadata field can be created for the CH object. For
instance if the mapping of the CH record to a vocabulary concept is known to be sure, we can
create a new dc:subject field linking the CH record with the appropriate vocabulary concept.
Note however that PATHS enrichments are automatically performed, and it is not certain that
a concept enrichment derived from a dc:subject would result in a dc:subject relation between
the object and the concept. The link to the concept may have been identified based on only a
small part of the original field, thus missing some of the original semantics. Thus some manual
assessment has to be done in order to promote the annotation into a proper metadata field.

4

Using Open Annotation to represent attributes in relations

The roadmap described in the previous section covers the main aspects of ESEPaths. However,
there is a first piece of ESEPaths data, which can not be easily represented in EDM as it inherits
RDF’s focus on binary relations: attributes on relations. Almost all annotations created by the
PATHS project have some information associated to them. Especially, many annotations record
a confidence value, describing the level of certainty of the automatic method when creating the
annotation.
A way to overcome this limitation in an RDF-based model would be to reify the annotation into an instance of a dedicated class, and represent the annotation attributes using class
properties. For this we can re-use elements from the Open Annotation (OA) model9 . Consider
this ESEPaths snippet:
<record>
...
<europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri>
<paths:background_link source="wikipedia" start_offset="0" end_offset="11"
field="dc:subject" confidence="0.015"
method="wikipedia-miner-1.2.0"
title="Archaeology">
http://en.wikipedia.org/wiki/Archaeology
</paths:background_link>
</record>

We would create the following oa:Annotation for it:
background_link1 a oa:Annotation ;
a paths:BackgroundLinkAnnotation ;
oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ;
oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ;
#Or <http://dbpedia.org/resource/Archaeology>
paths:source
<http://en.wikipedia.org> ;
#Or <dbpedia.org>
paths:confidence
"0.015" .

In the example, the <paths:background_link> annotation has been converted (reified) to
an oa:Annotation resource background_link_resource1 of type paths:BackgroundLinkAnnotation,
9

http://www.openannotation.org/spec/core/

9
linked by the oa:hasTarget relation to the PATHS proxy resource. The attributes of the original relation are now represented as properties of this new resource.
An alternative of the above approach would be using the OA “motivation” property for
representing the annotation. The OA motivation is meant to represent “the reasons why the
Annotation was created, not just the agents involved” 10 , which fits particularly well with the
kind of information we want to represent. The “motivation” approach would lead to the following
triplets:
background_link1 a oa:Annotation ;
oa:motivatedBy paths:backgroundLinkMotivation ;
oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ;
oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ;
#Or <http://dbpedia.org/resource/Archaeology>
paths:source
<http://en.wikipedia.org> ;
#Or <dbpedia.org>
paths:confidence
"0.015" .

In this case, the <paths:background_link> object is of type oa:Annotation, and it is also
oa:motivatedBy a paths:backgroundLinkMotivation, an instance of skos:Concept.
Both approaches described so far solve the main problem of attaching attributes to relations,
and also the need of defining specific relations for PATHS such as paths:background_link,
that would conflict with the metadata fields currently used by EDM. Note however that the
properties of the newly defined reified annotations are still specific for PATHS (paths:source,
paths:confidence, etc).
On a side note, using reified concepts for annotation raises the issue of whether we should
still keep the proxy-based representation next to it. Because now all the PATHS enrichment
data is attached to the reified annotation, the Proxy object described in Section 3 will convey
little or no information at all, compared to the original data.

4.1

Offsets and selectors

There is another piece of ESEPaths data, which is not currently represented in EDMPaths,
namely, the field and offset attributes of the relations. Because all PATHS annotations are
extracted from the textual content of some metadata field in the original CH record representation, ESEPaths annotations keeps track of the original text snippet (called the anchor ) which
was used to derive the enrichment.
In order to track this kind of provenance information, EDM could re-use the selectors from
the Open Annotation model11 . For instance, Consider the following ESEPaths snippet:
<record>
...
<europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri>
...
<paths:background_link start_offset="0" end_offset="11"
field="dc:subject" ... >
http://en.wikipedia.org/wiki/Archaeology
</paths:background_link>
</record>
10
11

http://www.openannotation.org/spec/core/core.html#Motivations
http://www.openannotation.org/spec/core/specific.html#Selectors

10
It describes an “background link” annotation for the CH object “09405/8F49” which was extracted by analyzing the offsets 0-11of the dc:subject of the original record. These offsets
could be translated to the following Open Annotation snippet:
background_link1 a oa:Annotation ;
oa:hasTarget anchor1 ;
oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> .
anchor1 a oa:SpecificResource ;
oa:hasSource ??? ; # which type has this object ?
oa:hasSelector selector1 .
selector1 a oa:TextPositionSelector ;
oa:start 0 ;
oa:end 11 .

As noted in the snippet, our problem is then to define the type of the anchor1 resource.
This object should represent the dc:subject field of CH record “09405/8F49”, but there is
actually no way to describe this with EDM. We thus decided to leave this piece of information
out of our proposed solution.

5

Conclusion

In this work we describe a method for representing automatically created PATHS annotations
into the EDM model. We first describe a simple way for representing the annotations and discuss
its benefits and drawbacks. One important weakness of the simple annotation schema lies in
its inability to represent attributes of annotations, such as confidence scores. To overcome this
limitation we propose a more complex solution that involves reifing the annotation properties as
instances of dedicated classes, and representing the annotation attributes using class properties.
For this we have re-used elements from the Open Annotation (OA) model.
The method presented here, called EDMPaths, is able to properly represent the annotations
following EDM, but some information which was previously present following ESE has been
left out. In particular, information regarding the particular offset of the anchor that caused the
annotation was produced has proven difficult to represent.
One of our main design goals has been to avoid creating new non-standard classes and
properties when defining EDMPaths. We think we have succeed on this particular aspect,
mainly by reusing elements from initiatives such as the Open Annotation model. However, the
proposal describes some properties which are still specific for the PATHS project.

References
[Agirre and de Lacalle, 2011] Agirre, E. and de Lacalle, O. L. (2011). D2.1: Processing and
representation of content for first prototype. Technical report, PATHS project.
[Doerr et al., 2010] Doerr, M., Gradmann, S., Hennicke, S., Isaac, A., Meghini, C., and van de
Sompel, H. (2010). The europeana data model (EDM). In World Library and Information
Congress: 76th IFLA general conference and assembly, pages 10–15.
11
[Otegi et al., 2012] Otegi, A., Agirre, E., and Soroa, A. (2012). D2.2: Processing and representation of content for second prototype. Technical report, PATHS project.

12

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesLaura Po
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit IIpkaviya
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Getaneh Alemu
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Getaneh Alemu
 
Metadata enriching and filtering for enhanced collection discoverability
Metadata enriching and filtering for enhanced collection discoverability  Metadata enriching and filtering for enhanced collection discoverability
Metadata enriching and filtering for enhanced collection discoverability Getaneh Alemu
 
Linked Data as a new environment for Learning Analytics and education
Linked Data as a new environment  for Learning Analytics and educationLinked Data as a new environment  for Learning Analytics and education
Linked Data as a new environment for Learning Analytics and educationMathieu d'Aquin
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities Getaneh Alemu
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedJoel Azzopardi
 
Linked Data for African Libraries
Linked Data for African LibrariesLinked Data for African Libraries
Linked Data for African LibrariesGetaneh Alemu
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESijnlc
 
Searching Linked Data
Searching Linked DataSearching Linked Data
Searching Linked DataThanh Tran
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...Stefan Schmunk
 

Was ist angesagt? (20)

Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
Ji cv6n2
Ji cv6n2Ji cv6n2
Ji cv6n2
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library
 
Metadata enriching and filtering for enhanced collection discoverability
Metadata enriching and filtering for enhanced collection discoverability  Metadata enriching and filtering for enhanced collection discoverability
Metadata enriching and filtering for enhanced collection discoverability
 
Linked Data as a new environment for Learning Analytics and education
Linked Data as a new environment  for Learning Analytics and educationLinked Data as a new environment  for Learning Analytics and education
Linked Data as a new environment for Learning Analytics and education
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Linked Data for African Libraries
Linked Data for African LibrariesLinked Data for African Libraries
Linked Data for African Libraries
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
Linked library data
Linked library dataLinked library data
Linked library data
 
114 sem 3_j-walker
114 sem 3_j-walker114 sem 3_j-walker
114 sem 3_j-walker
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
 
Searching Linked Data
Searching Linked DataSearching Linked Data
Searching Linked Data
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
 

Andere mochten auch

Madame bovary
Madame bovaryMadame bovary
Madame bovarysaronkbd
 
Madame bovary de Gustave Flaubert
Madame bovary de Gustave FlaubertMadame bovary de Gustave Flaubert
Madame bovary de Gustave FlaubertNelly Rosas Rioja
 
IND-2012-317 Govt Satya Bharti Adarsh Sr Sec School, Chogawan -Paudha Lagao, ...
IND-2012-317 Govt Satya Bharti Adarsh Sr Sec School, Chogawan -Paudha Lagao, ...IND-2012-317 Govt Satya Bharti Adarsh Sr Sec School, Chogawan -Paudha Lagao, ...
IND-2012-317 Govt Satya Bharti Adarsh Sr Sec School, Chogawan -Paudha Lagao, ...designforchangechallenge
 
презентация:)
презентация:)презентация:)
презентация:)ILgizmironov
 
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...pathsproject
 
Presentatie bibnet
Presentatie bibnetPresentatie bibnet
Presentatie bibnetBramStarckx
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring reportpathsproject
 
Boletim informativo novembro/dezembro 2015
Boletim informativo novembro/dezembro 2015Boletim informativo novembro/dezembro 2015
Boletim informativo novembro/dezembro 2015bibliotecasjuliomartins
 
De-list your organization from a blacklist | My E-mail appears as spam | Part...
De-list your organization from a blacklist | My E-mail appears as spam | Part...De-list your organization from a blacklist | My E-mail appears as spam | Part...
De-list your organization from a blacklist | My E-mail appears as spam | Part...Eyal Doron
 
Ozon
OzonOzon
OzonAyu P
 
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
Exchange In-Place eDiscovery & Hold | Introduction  | 5#7Exchange In-Place eDiscovery & Hold | Introduction  | 5#7
Exchange In-Place eDiscovery & Hold | Introduction | 5#7Eyal Doron
 
PATHS @ LATECH 2013
PATHS @ LATECH 2013PATHS @ LATECH 2013
PATHS @ LATECH 2013pathsproject
 
The old exchange environment versus modern exchange environment part 02#36
The old exchange environment versus modern exchange environment  part 02#36The old exchange environment versus modern exchange environment  part 02#36
The old exchange environment versus modern exchange environment part 02#36Eyal Doron
 

Andere mochten auch (17)

Individuation process
Individuation processIndividuation process
Individuation process
 
Madame bovary
Madame bovaryMadame bovary
Madame bovary
 
Gustave flaubert et madame bovary
Gustave flaubert  et  madame bovaryGustave flaubert  et  madame bovary
Gustave flaubert et madame bovary
 
Madame bovary de Gustave Flaubert
Madame bovary de Gustave FlaubertMadame bovary de Gustave Flaubert
Madame bovary de Gustave Flaubert
 
IND-2012-317 Govt Satya Bharti Adarsh Sr Sec School, Chogawan -Paudha Lagao, ...
IND-2012-317 Govt Satya Bharti Adarsh Sr Sec School, Chogawan -Paudha Lagao, ...IND-2012-317 Govt Satya Bharti Adarsh Sr Sec School, Chogawan -Paudha Lagao, ...
IND-2012-317 Govt Satya Bharti Adarsh Sr Sec School, Chogawan -Paudha Lagao, ...
 
презентация:)
презентация:)презентация:)
презентация:)
 
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
 
Presentatie bibnet
Presentatie bibnetPresentatie bibnet
Presentatie bibnet
 
IND-2012-287 Anando -MILKY IDEA
IND-2012-287 Anando -MILKY IDEAIND-2012-287 Anando -MILKY IDEA
IND-2012-287 Anando -MILKY IDEA
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring report
 
Boletim informativo novembro/dezembro 2015
Boletim informativo novembro/dezembro 2015Boletim informativo novembro/dezembro 2015
Boletim informativo novembro/dezembro 2015
 
De-list your organization from a blacklist | My E-mail appears as spam | Part...
De-list your organization from a blacklist | My E-mail appears as spam | Part...De-list your organization from a blacklist | My E-mail appears as spam | Part...
De-list your organization from a blacklist | My E-mail appears as spam | Part...
 
Ozon
OzonOzon
Ozon
 
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
Exchange In-Place eDiscovery & Hold | Introduction  | 5#7Exchange In-Place eDiscovery & Hold | Introduction  | 5#7
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
 
PATHS @ LATECH 2013
PATHS @ LATECH 2013PATHS @ LATECH 2013
PATHS @ LATECH 2013
 
The old exchange environment versus modern exchange environment part 02#36
The old exchange environment versus modern exchange environment  part 02#36The old exchange environment versus modern exchange environment  part 02#36
The old exchange environment versus modern exchange environment part 02#36
 
GUJ-2012-12 Fazalpur Prathmik Shala No 1
GUJ-2012-12 Fazalpur Prathmik Shala No 1 GUJ-2012-12 Fazalpur Prathmik Shala No 1
GUJ-2012-12 Fazalpur Prathmik Shala No 1
 

Ähnlich wie Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

Semantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHSSemantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHSpathsproject
 
Enrichment and Europeana
Enrichment and EuropeanaEnrichment and Europeana
Enrichment and EuropeanaAntoine Isaac
 
Eprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEduserv Foundation
 
Using Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked DataUsing Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked DataSheila Kinsella
 
Mapping the European(a) metadata landscape
Mapping the European(a) metadata landscapeMapping the European(a) metadata landscape
Mapping the European(a) metadata landscapeSally Chambers
 
Annotations Supporting Scholarly Editing
Annotations Supporting Scholarly EditingAnnotations Supporting Scholarly Editing
Annotations Supporting Scholarly EditingAnna Gerber
 
Annotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University LibraryAnnotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University LibraryTimothy Cole
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
Repositories and the wider context
Repositories and the wider contextRepositories and the wider context
Repositories and the wider contextJulie Allinson
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Jane Stevenson
 
20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanities20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanitiesStefan Gradmann
 
Annotations chicago
Annotations chicagoAnnotations chicago
Annotations chicagoTimothy Cole
 
ORE and SWAP: Composition and Complexity
ORE and SWAP: Composition and ComplexityORE and SWAP: Composition and Complexity
ORE and SWAP: Composition and ComplexityEduserv Foundation
 

Ähnlich wie Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac (20)

Semantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHSSemantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHS
 
Enrichment and Europeana
Enrichment and EuropeanaEnrichment and Europeana
Enrichment and Europeana
 
ORE en Fedora Op Klompen
ORE en Fedora Op KlompenORE en Fedora Op Klompen
ORE en Fedora Op Klompen
 
Eprints Application Profile
Eprints Application ProfileEprints Application Profile
Eprints Application Profile
 
Eprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, Mexico
 
Using Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked DataUsing Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked Data
 
Mapping the European(a) metadata landscape
Mapping the European(a) metadata landscapeMapping the European(a) metadata landscape
Mapping the European(a) metadata landscape
 
Annotations Supporting Scholarly Editing
Annotations Supporting Scholarly EditingAnnotations Supporting Scholarly Editing
Annotations Supporting Scholarly Editing
 
Annotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University LibraryAnnotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University Library
 
Price "KBART: improving the supply of data to link resolvers and knowledge ba...
Price "KBART: improving the supply of data to link resolvers and knowledge ba...Price "KBART: improving the supply of data to link resolvers and knowledge ba...
Price "KBART: improving the supply of data to link resolvers and knowledge ba...
 
Price "KBART: Improving the Supply of Data to Link Resolvers and Knowledge Ba...
Price "KBART: Improving the Supply of Data to Link Resolvers and Knowledge Ba...Price "KBART: Improving the Supply of Data to Link Resolvers and Knowledge Ba...
Price "KBART: Improving the Supply of Data to Link Resolvers and Knowledge Ba...
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Repositories and the wider context
Repositories and the wider contextRepositories and the wider context
Repositories and the wider context
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanities20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanities
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Annotations chicago
Annotations chicagoAnnotations chicago
Annotations chicago
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 
ORE and SWAP: Composition and Complexity
ORE and SWAP: Composition and ComplexityORE and SWAP: Composition and Complexity
ORE and SWAP: Composition and Complexity
 

Mehr von pathsproject

Generating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperGenerating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperpathsproject
 
Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...pathsproject
 
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperGenerating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperpathsproject
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conferencepathsproject
 
PATHS at the EAA conference 2013
PATHS at the EAA conference 2013PATHS at the EAA conference 2013
PATHS at the EAA conference 2013pathsproject
 
PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013pathsproject
 
Comparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationComparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationpathsproject
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similaritypathsproject
 
Comparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentsComparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentspathsproject
 
PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0pathsproject
 
PATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototypePATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototypepathsproject
 
PATHS Second prototype-functional-spec
PATHS Second prototype-functional-specPATHS Second prototype-functional-spec
PATHS Second prototype-functional-specpathsproject
 
PATHS Final state of art monitoring report v0_4
PATHS  Final state of art monitoring report v0_4PATHS  Final state of art monitoring report v0_4
PATHS Final state of art monitoring report v0_4pathsproject
 
PATHS first paths prototype
PATHS first paths prototypePATHS first paths prototype
PATHS first paths prototypepathsproject
 
PATHS Content processing 2nd prototype-revised.v2
PATHS Content processing 2nd prototype-revised.v2PATHS Content processing 2nd prototype-revised.v2
PATHS Content processing 2nd prototype-revised.v2pathsproject
 
PATHS Content processing 1st prototype
PATHS  Content processing 1st prototypePATHS  Content processing 1st prototype
PATHS Content processing 1st prototypepathsproject
 
PATHS system architecture
PATHS system architecturePATHS system architecture
PATHS system architecturepathsproject
 
PATHS Functional specification first prototype
PATHS Functional specification first prototypePATHS Functional specification first prototype
PATHS Functional specification first prototypepathsproject
 
PATHS: User Requirements Analysis v1.0
PATHS: User Requirements Analysis v1.0PATHS: User Requirements Analysis v1.0
PATHS: User Requirements Analysis v1.0pathsproject
 

Mehr von pathsproject (20)

Generating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperGenerating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paper
 
Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...
 
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperGenerating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conference
 
PATHS at the EAA conference 2013
PATHS at the EAA conference 2013PATHS at the EAA conference 2013
PATHS at the EAA conference 2013
 
PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013
 
Comparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationComparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentation
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similarity
 
Comparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentsComparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documents
 
PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0
 
PATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototypePATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototype
 
PATHS Second prototype-functional-spec
PATHS Second prototype-functional-specPATHS Second prototype-functional-spec
PATHS Second prototype-functional-spec
 
PATHS Final state of art monitoring report v0_4
PATHS  Final state of art monitoring report v0_4PATHS  Final state of art monitoring report v0_4
PATHS Final state of art monitoring report v0_4
 
PATHS first paths prototype
PATHS first paths prototypePATHS first paths prototype
PATHS first paths prototype
 
PATHS Content processing 2nd prototype-revised.v2
PATHS Content processing 2nd prototype-revised.v2PATHS Content processing 2nd prototype-revised.v2
PATHS Content processing 2nd prototype-revised.v2
 
PATHS Content processing 1st prototype
PATHS  Content processing 1st prototypePATHS  Content processing 1st prototype
PATHS Content processing 1st prototype
 
PATHS system architecture
PATHS system architecturePATHS system architecture
PATHS system architecture
 
PATHS Functional specification first prototype
PATHS Functional specification first prototypePATHS Functional specification first prototype
PATHS Functional specification first prototype
 
PATHS: User Requirements Analysis v1.0
PATHS: User Requirements Analysis v1.0PATHS: User Requirements Analysis v1.0
PATHS: User Requirements Analysis v1.0
 

Kürzlich hochgeladen

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

  • 1. Personalised access to cultural heritage spaces Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment Authors: explore ! paths ! www.paths-project.eu! search! Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac
  • 2. Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment Aitor Soroa, Eneko Agirre, Arantxa Otegi, Antoine Isaac February 10, 2014 Contents 1 Introduction 1 2 ESEPaths 2 3 Roadmap for basic conversion of ESEPaths to EDM 4 4 Using Open Annotation to represent attributes in relations 4.1 Offsets and selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 10 5 Conclusion 11 1 Introduction This document is a case study on using the Europeana Data Model (EDM) [Doerr et al., 2010]1 for representing annotations of Cultural Heritage Objects (CHO). One of the main goals of the PATHS project is to augment CHOs (items) with information that will enrich the user’s experience. The additional information includes links between items in cultural collections and from items to external sources like Wikipedia. With this goal, the PATHS project has applied Natural Language Processing (NLP) techniques on a subset of the items in Europeana. Using these techniques, PATHS enriches CH items with the following information [Agirre and de Lacalle, 2011, Otegi et al., 2012]: • Informativeness score: each item is associated to a value indicating the overall “informativeness” of the item, which is derived from the amount of text in its metadata and inversely proportional to the number of items where the same text is mentioned. • Vocabulary terms: vocabulary terms associated to the item. These terms are used for creating the tag clouds shown to the user. 1 http://pro.europeana.eu/edm-documentation 1
  • 3. Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment Aitor Soroa, Eneko Agirre, Arantxa Otegi, Antoine Isaac February 10, 2014 Contents 1 Introduction 1 2 ESEPaths 2 3 Roadmap for basic conversion of ESEPaths to EDM 4 4 Using Open Annotation to represent attributes in relations 4.1 Offsets and selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 10 5 Conclusion 11 1 Introduction This document is a case study on using the Europeana Data Model (EDM) [Doerr et al., 2010]1 for representing annotations of Cultural Heritage Objects (CHO). One of the main goals of the PATHS project is to augment CHOs (items) with information that will enrich the user’s experience. The additional information includes links between items in cultural collections and from items to external sources like Wikipedia. With this goal, the PATHS project has applied Natural Language Processing (NLP) techniques on a subset of the items in Europeana. Using these techniques, PATHS enriches CH items with the following information [Agirre and de Lacalle, 2011, Otegi et al., 2012]: • Informativeness score: each item is associated to a value indicating the overall “informativeness” of the item, which is derived from the amount of text in its metadata and inversely proportional to the number of items where the same text is mentioned. • Vocabulary terms: vocabulary terms associated to the item. These terms are used for creating the tag clouds shown to the user. 1 http://pro.europeana.eu/edm-documentation 1
  • 4. • Event information associated with the item: CHOs often provide event- or activity-related information, such as people walking, etc. We enrich the items by means of a predefined list of words that can be used to refer to events. This data allows answering questions like “give me items with people running”, “items with people playing”, etc. • Related items: CH items which are semantically related. • Background links that relate CH items with external resources such as Wikipedia. When linking a CH item with some external resource, we keep track of the original text snippet from which the association is derived. For instance, an item could be related to a Wikipedia article because of some text snippet of the dc:description field. In such case we store the reference to the field and offset as attributes.2 (note that in some cases however there is little point in keeping the text, because the enrichment is done based on a combination of metadata fields) The PATHS project started in 2011, and it adopted the representation schema of choice then, ESE3 . We extended it extended to a format called ESEPaths to represent the enrichment information just mentioned [Agirre and de Lacalle, 2011, Otegi et al., 2012]. In this document we describe a proposal for representing PATHS enrichments following EDM (Europeana Data Model), the new data model used by Europeana. The document is structured as follows. We first introduce ESEPaths (Section 2), then the roadmap for a simple conversion to EDM (Section 3). Section 4 explains some possible (advanced) solutions to the problems identified in Section 3. Finally the conclusions are drawn. 2 ESEPaths PATHS has defined a format derived from ESE, called ESEPaths, which adds the enrichment information described above. Specifically, ESEPaths adds the following fields: • <paths:informativeness> with the informativeness score of the ESE record. • <paths:vocabulary>, which links the ESE record with vocabulary terms. The element has the following attributes: – name: name of the external vocabulary. – URI: the address (URI) of the specific category in the vocabulary. – confidence: the confidence of the association. • <paths:event> which links the ESE record with external events. The element has the following attributes: – source: the name of the external resource of the event (for instance, WordNet). – canonical_form: the canonical word form of the annotated event. – confidence: confidence of the association. 2 Keeping track of this information, for instance, for an interface showing those annotations, as it can emphasize the specific snippet and link it to the Wikipedia/dbpedia article when the user points to it. 3 http://www.europeana.eu/schemas/ese/ 2
  • 5. <record> <!-- Existing ESE record --> <dc:identifier>http://www.thebowesmuseum.org.uk/10432/</dc:identifier> <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> <dc:title>Stembridge Windmill, High Ham, Somerset</dc:title> <dc:description>This is a random-coursed blue lias ...</dc:description> <dcterms:isPartOf>Bowes Museum</dcterms:isPartOf> <dc:subject>1670</dc:subject> <dc:type>Image</dc:type> <europeana:provider>CultureGrid</europeana:provider> <europeana:isShownAt>http://www.thebowesmuseum.org.uk/10432/</europeana:isShownAt> <europeana:hasObject>false</europeana:hasObject> <europeana:country>uk</europeana:country> <europeana:type>IMAGE</europeana:type> <europeana:language>en</europeana:language> <!-- ESEPaths augmentation --> <!-- item informativeness --> <paths:informativeness>0.7</paths:informativeness> <!-- vocabulary mapping --> <paths:vocabulary confidence="0.8" source="wikicat" URI="http://en.wikipedia.org/wiki/Category:Tower_mills"> Tower Mills</paths:vocabulary> <!-- events --> <paths:event confidence="0.8" source="wordnet" canonical_form="play" start_offset="120" end_offset="127" field="dc:description"> playing</paths:event> <!-- related items --> <paths:related_item confidence="0.8" field="dc:subject" field_no="1" method="LDA"> http://www.europeana.eu/portal/record/09405t/A6F9A </paths:related_item> <!-- background links items --> <paths:background_link source="wikipedia" start_offset="0" end_offset="11" field="dc:subject" confidence="0.015" method="wikipedia-miner-1.2.0" title="Archaeology"> http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> Figure 1: Example of an ESEPaths record 3
  • 6. • <paths:related_item> which links the ESE record with related CH items. The element has the following attributes: – confidence: confidence of the association. – method: which method produced the association – field: the name of the ESE field whose content suggests the similarity relation. – field_no: the position of the ESE field described above (useful in case the ESE records contains more than one field with the same name). • <paths:background_link>: which links the ESE record with an item from an external resource. The element has the following attributes: – source: the name of the external resource. – start_offset: the offset (in characters) within the field element where the text anchor begins. – end_offset: the offset (in characters) within the field element where the text anchor ends. – field: the field of the ESE record where the anchor for this relation is located. – confidence: confidence of the association. – method: which method produced the association. – title: title of the URL which the background link points to. – sentiment: polarity of the textual information included in the corresponding link. It has fixed values, namely “pos” for positive results, “neg” for negative and “neu” for neutral. Figure 1 shows an example of a CH record in ESEPaths. The first lines are just a copy of the original ESE record, whereas the new elements (in the paths namespace) are at the end. Note that identifiers (incl. URIs) are not real, and shortened so that the listing fits on the page. 3 Roadmap for basic conversion of ESEPaths to EDM As said before, all the data produced by the PATHS project is encoded following the ESE format extended with new elements. However, Europeana is switching from ESE to a new data model, EDM. The main difference between ESE and EDM is that the latter is more expressive and based on Semantic Web and Linked Data technologies (RDF, ontologies). In this section, we outline the main design we devise for switching from ESEPaths to EDM. The main design criteria we have followed is the following: 1. All PATHS annotations should be properly represented using EDM. 2. It must be possible to retrieve particular PATHS annotations. 3. We should depart as less as possible from standard EDM. 4
  • 7. The first criterion states that all PATHS annotations should be described using EDM. As will be shown below, some annotation attributes are difficult to represent following EDM and, as a consequence, a compromise has to be made between describing PATHS annotations in their full richness and using proper EDM concepts and properties for representing them. The second criterion states that the EDM representation has to respect the types of the PATHS annotations. For instance, it has to be possible to retrieve all background links of a particular CH item (as opposite as, say, its related items). Finally, the last criterion states that we should use widely used EDM objects and properties as possible. In particular, the EDM representation should use the set of elements described by Europeana’s instructions for providers4 , when possible. We now describe the main steps to describe the PATHS annotations to EDM. From ESEPaths to EDM We start describing the resources which are already in Europeana. This includes an Europeana ore:Aggregation resource with information about the digital aggregation process itself (provider, etc)5 . <http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:dataProvider "English Heritage - Viewfinder"; edm:provider "CultureGrid"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>; edm:rights <http://www.europeana.eu/rights/rr-f/>. Europeana also provides a proxy for the CHO, attached to this aggregation6 : <http://data.europeana.eu/proxy/provider/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://data.europeana.eu/aggregation/provider/09405/8F49>; # Original ESE data dc:creator "Davies, J O"; dc:date "[2001]"; dc:title "Stembridge Windmill, High Ham, Somerset"; dc:description "This is a random-coursed blue lias ...". We now describe the way to represent the enrichment annotations as provided by the PATHS project. We encapsulate these annotations into a new ore:Aggregation. This aggregation resource records a first set of enrichments created by the PATHS project over the original CH object. It includes all relevant information like provider name, access rights, etc. as well as the annotations referring to the whole CH object, as opposed to enrichment information extracted from some subset of the CH object’s metadata. 4 http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders The resource identifier of the aggregation used in the example is not real. The real one should be provided by Europeana. 6 Note again that the resource identifier of the proxy used in the example is not real. 5 5
  • 8. <http://www.paths-project.eu/aggregation/paths/09405/8F49> a ore:Aggregation edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:provider "PATHS"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:rights <http://www.paths-project.eu/rights/rr-f/>; # item informativeness paths:informativeness "0.7". There are some notes to be aware of: • The isShownAt property points to the original record, as the PATHS project does not store any information besides the proper enrichment of CH items. • The edm:rights property refers to the annotated information (instead of the rights of the original CH item). • As said before, the paths:informativeness element pertains to the PATHS aggregation resource because it refers to the CH object as a whole. Finally, we create a proxy resource for the PATHS aggregation and describe the remaining paths annotations within the scope (as properties) of this resource: <http://www.paths-project.eu/proxy/paths/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/paths/09405/8F49> # vocabulary mapping edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>; # events edm:isRelatedTo <http://www.paths-project.eu/event/playing>; # related items edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>; # background links items edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>. # Or <http://dbpedia.org/resource/Archaeology> Representing various types of enrichment. As shown in the example, the proxy resource relates the CH item with external resources such as vocabulary concepts, events, related items or objects from some external sources (such as Wikipedia or dbpedia). As all the associations are described by means of the high-level edm:isRelatedTo property, it is necessary to properly declare the types of the external objects related to the CH object. Otherwise, there would be no way to discriminate among the different types of PATHS annotations (for instance, there would be no way to specifically retrieve the vocabulary concepts related to a CH object). As a first solution, we can include a separate description for the resources linked to the CH object using SKOS7 . Within PATHS we define the following types of external resources: • Related CH items: are of type paths:RelatedItemConcept, which is in turn a subclass of skos:Concept. 7 • Vocabulary concepts are of type skos:Concept. http://www.w3.org/2004/02/skos 6
  • 9. • Events are of type paths:EventConcept, a subclass of skos:Concept. It represents any concept which refers to a (type of) event (such as “run”, “play”, etc). • Background links are of type paths:BackgroundLinkConcept, a subclass of skos:Concept. Note that these classes are meant to offer a way to discriminate among the different types of annotations inside the PATHS project. The classes are therefore loosely defined, in the sense that they do not describe the proper semantic type of the resources. For instance, PATHS can relate a CH object with a dbpedia resource representing a place (New_York), a person (Pablo_Picasso), etc. However, within the scope of the PATHS annotations, the only explicit common type for all those resources can be inherited from their “background link” status. Also note that at the time being, Europeana would not be able to perfectly ingest data that uses such sub-classes, as they depart from the set of elements described by Europeana’s instructions for providers8 . This would require Europeana to handle specialisations of EDM, which is not precisely scheduled at the time of writing. Based on the above, we also include the following statements in the example: <http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept; skos:prefLabel "Tower Mills"@en. <http://www.paths-project.eu/event/playing> a paths:EventConcept; skos:prefLabel "playing"@en. <http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept. <http://en.wikipedia.org/wiki/Archaeology> a paths:BackgroundLinkConcept; skos:prefLabel "Archeology"@en. along with the definitions of these new types: paths:EventConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Event Concept"@en ; skos:definition "A concept describing an Event"@en . paths:RelatedItemConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Related Item Concept"@en ; skos:definition "A concept describing a CH record"@en . paths:BackgroundLinkConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Background Link Concept"@en ; skos:definition "A concept describing an object from an external source such as dbpedia"@en . The above definitions can be put next to the annotation data, in a separate file directly provided to Europeana or others, or even served over the Web in a Linked Data scenario. The whole EDM representation for the item is shown in Figure 2. 8 http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders 7
  • 10. <http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:dataProvider "English Heritage - Viewfinder"; edm:provider "CultureGrid"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>; edm:rights <http://www.europeana.eu/rights/rr-f/>. <http://data.europeana.eu/proxy/europeana/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49>; # Existing ESE record dc:creator "Davies, J O"; dc:date "[2001]"; dc:title "Stembridge Windmill, High Ham, Somerset"; dc:description "This is a random-coursed blue lias ...". <http://www.paths-project.eu/aggregation/europeana/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:provider "PATHS"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:rights <http://www.paths-project.eu/rights/rr-f/>; # item informativeness paths:informativeness "0.7". <http://www.paths-project.eu/proxy/europeana/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49> # vocabulary mapping edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>; # events edm:isRelatedTo <http://www.paths-project.eu/event/playing>; # related items edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>; # background links items edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>. # Or <http://dbpedia.org/resource/Archaeology> <http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept; skos:prefLabel "Tower Mills"@en. <http://www.paths-project.eu/event/playing> a paths:EventConcept; skos:prefLabel "playing"@en. <http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept. Figure 2: EDM representation of the ESEPaths example 8
  • 11. Using specific metadata fields to represent enrichments Alternatively, if a PATHS enrichment is known to be certain, a new metadata field can be created for the CH object. For instance if the mapping of the CH record to a vocabulary concept is known to be sure, we can create a new dc:subject field linking the CH record with the appropriate vocabulary concept. Note however that PATHS enrichments are automatically performed, and it is not certain that a concept enrichment derived from a dc:subject would result in a dc:subject relation between the object and the concept. The link to the concept may have been identified based on only a small part of the original field, thus missing some of the original semantics. Thus some manual assessment has to be done in order to promote the annotation into a proper metadata field. 4 Using Open Annotation to represent attributes in relations The roadmap described in the previous section covers the main aspects of ESEPaths. However, there is a first piece of ESEPaths data, which can not be easily represented in EDM as it inherits RDF’s focus on binary relations: attributes on relations. Almost all annotations created by the PATHS project have some information associated to them. Especially, many annotations record a confidence value, describing the level of certainty of the automatic method when creating the annotation. A way to overcome this limitation in an RDF-based model would be to reify the annotation into an instance of a dedicated class, and represent the annotation attributes using class properties. For this we can re-use elements from the Open Annotation (OA) model9 . Consider this ESEPaths snippet: <record> ... <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> <paths:background_link source="wikipedia" start_offset="0" end_offset="11" field="dc:subject" confidence="0.015" method="wikipedia-miner-1.2.0" title="Archaeology"> http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> We would create the following oa:Annotation for it: background_link1 a oa:Annotation ; a paths:BackgroundLinkAnnotation ; oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ; #Or <http://dbpedia.org/resource/Archaeology> paths:source <http://en.wikipedia.org> ; #Or <dbpedia.org> paths:confidence "0.015" . In the example, the <paths:background_link> annotation has been converted (reified) to an oa:Annotation resource background_link_resource1 of type paths:BackgroundLinkAnnotation, 9 http://www.openannotation.org/spec/core/ 9
  • 12. linked by the oa:hasTarget relation to the PATHS proxy resource. The attributes of the original relation are now represented as properties of this new resource. An alternative of the above approach would be using the OA “motivation” property for representing the annotation. The OA motivation is meant to represent “the reasons why the Annotation was created, not just the agents involved” 10 , which fits particularly well with the kind of information we want to represent. The “motivation” approach would lead to the following triplets: background_link1 a oa:Annotation ; oa:motivatedBy paths:backgroundLinkMotivation ; oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ; #Or <http://dbpedia.org/resource/Archaeology> paths:source <http://en.wikipedia.org> ; #Or <dbpedia.org> paths:confidence "0.015" . In this case, the <paths:background_link> object is of type oa:Annotation, and it is also oa:motivatedBy a paths:backgroundLinkMotivation, an instance of skos:Concept. Both approaches described so far solve the main problem of attaching attributes to relations, and also the need of defining specific relations for PATHS such as paths:background_link, that would conflict with the metadata fields currently used by EDM. Note however that the properties of the newly defined reified annotations are still specific for PATHS (paths:source, paths:confidence, etc). On a side note, using reified concepts for annotation raises the issue of whether we should still keep the proxy-based representation next to it. Because now all the PATHS enrichment data is attached to the reified annotation, the Proxy object described in Section 3 will convey little or no information at all, compared to the original data. 4.1 Offsets and selectors There is another piece of ESEPaths data, which is not currently represented in EDMPaths, namely, the field and offset attributes of the relations. Because all PATHS annotations are extracted from the textual content of some metadata field in the original CH record representation, ESEPaths annotations keeps track of the original text snippet (called the anchor ) which was used to derive the enrichment. In order to track this kind of provenance information, EDM could re-use the selectors from the Open Annotation model11 . For instance, Consider the following ESEPaths snippet: <record> ... <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> ... <paths:background_link start_offset="0" end_offset="11" field="dc:subject" ... > http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> 10 11 http://www.openannotation.org/spec/core/core.html#Motivations http://www.openannotation.org/spec/core/specific.html#Selectors 10
  • 13. It describes an “background link” annotation for the CH object “09405/8F49” which was extracted by analyzing the offsets 0-11of the dc:subject of the original record. These offsets could be translated to the following Open Annotation snippet: background_link1 a oa:Annotation ; oa:hasTarget anchor1 ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> . anchor1 a oa:SpecificResource ; oa:hasSource ??? ; # which type has this object ? oa:hasSelector selector1 . selector1 a oa:TextPositionSelector ; oa:start 0 ; oa:end 11 . As noted in the snippet, our problem is then to define the type of the anchor1 resource. This object should represent the dc:subject field of CH record “09405/8F49”, but there is actually no way to describe this with EDM. We thus decided to leave this piece of information out of our proposed solution. 5 Conclusion In this work we describe a method for representing automatically created PATHS annotations into the EDM model. We first describe a simple way for representing the annotations and discuss its benefits and drawbacks. One important weakness of the simple annotation schema lies in its inability to represent attributes of annotations, such as confidence scores. To overcome this limitation we propose a more complex solution that involves reifing the annotation properties as instances of dedicated classes, and representing the annotation attributes using class properties. For this we have re-used elements from the Open Annotation (OA) model. The method presented here, called EDMPaths, is able to properly represent the annotations following EDM, but some information which was previously present following ESE has been left out. In particular, information regarding the particular offset of the anchor that caused the annotation was produced has proven difficult to represent. One of our main design goals has been to avoid creating new non-standard classes and properties when defining EDMPaths. We think we have succeed on this particular aspect, mainly by reusing elements from initiatives such as the Open Annotation model. However, the proposal describes some properties which are still specific for the PATHS project. References [Agirre and de Lacalle, 2011] Agirre, E. and de Lacalle, O. L. (2011). D2.1: Processing and representation of content for first prototype. Technical report, PATHS project. [Doerr et al., 2010] Doerr, M., Gradmann, S., Hennicke, S., Isaac, A., Meghini, C., and van de Sompel, H. (2010). The europeana data model (EDM). In World Library and Information Congress: 76th IFLA general conference and assembly, pages 10–15. 11
  • 14. [Otegi et al., 2012] Otegi, A., Agirre, E., and Soroa, A. (2012). D2.2: Processing and representation of content for second prototype. Technical report, PATHS project. 12
  • 15. • Event information associated with the item: CHOs often provide event- or activity-related information, such as people walking, etc. We enrich the items by means of a predefined list of words that can be used to refer to events. This data allows answering questions like “give me items with people running”, “items with people playing”, etc. • Related items: CH items which are semantically related. • Background links that relate CH items with external resources such as Wikipedia. When linking a CH item with some external resource, we keep track of the original text snippet from which the association is derived. For instance, an item could be related to a Wikipedia article because of some text snippet of the dc:description field. In such case we store the reference to the field and offset as attributes.2 (note that in some cases however there is little point in keeping the text, because the enrichment is done based on a combination of metadata fields) The PATHS project started in 2011, and it adopted the representation schema of choice then, ESE3 . We extended it extended to a format called ESEPaths to represent the enrichment information just mentioned [Agirre and de Lacalle, 2011, Otegi et al., 2012]. In this document we describe a proposal for representing PATHS enrichments following EDM (Europeana Data Model), the new data model used by Europeana. The document is structured as follows. We first introduce ESEPaths (Section 2), then the roadmap for a simple conversion to EDM (Section 3). Section 4 explains some possible (advanced) solutions to the problems identified in Section 3. Finally the conclusions are drawn. 2 ESEPaths PATHS has defined a format derived from ESE, called ESEPaths, which adds the enrichment information described above. Specifically, ESEPaths adds the following fields: • <paths:informativeness> with the informativeness score of the ESE record. • <paths:vocabulary>, which links the ESE record with vocabulary terms. The element has the following attributes: – name: name of the external vocabulary. – URI: the address (URI) of the specific category in the vocabulary. – confidence: the confidence of the association. • <paths:event> which links the ESE record with external events. The element has the following attributes: – source: the name of the external resource of the event (for instance, WordNet). – canonical_form: the canonical word form of the annotated event. – confidence: confidence of the association. 2 Keeping track of this information, for instance, for an interface showing those annotations, as it can emphasize the specific snippet and link it to the Wikipedia/dbpedia article when the user points to it. 3 http://www.europeana.eu/schemas/ese/ 2
  • 16. <record> <!-- Existing ESE record --> <dc:identifier>http://www.thebowesmuseum.org.uk/10432/</dc:identifier> <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> <dc:title>Stembridge Windmill, High Ham, Somerset</dc:title> <dc:description>This is a random-coursed blue lias ...</dc:description> <dcterms:isPartOf>Bowes Museum</dcterms:isPartOf> <dc:subject>1670</dc:subject> <dc:type>Image</dc:type> <europeana:provider>CultureGrid</europeana:provider> <europeana:isShownAt>http://www.thebowesmuseum.org.uk/10432/</europeana:isShownAt> <europeana:hasObject>false</europeana:hasObject> <europeana:country>uk</europeana:country> <europeana:type>IMAGE</europeana:type> <europeana:language>en</europeana:language> <!-- ESEPaths augmentation --> <!-- item informativeness --> <paths:informativeness>0.7</paths:informativeness> <!-- vocabulary mapping --> <paths:vocabulary confidence="0.8" source="wikicat" URI="http://en.wikipedia.org/wiki/Category:Tower_mills"> Tower Mills</paths:vocabulary> <!-- events --> <paths:event confidence="0.8" source="wordnet" canonical_form="play" start_offset="120" end_offset="127" field="dc:description"> playing</paths:event> <!-- related items --> <paths:related_item confidence="0.8" field="dc:subject" field_no="1" method="LDA"> http://www.europeana.eu/portal/record/09405t/A6F9A </paths:related_item> <!-- background links items --> <paths:background_link source="wikipedia" start_offset="0" end_offset="11" field="dc:subject" confidence="0.015" method="wikipedia-miner-1.2.0" title="Archaeology"> http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> Figure 1: Example of an ESEPaths record 3
  • 17. • <paths:related_item> which links the ESE record with related CH items. The element has the following attributes: – confidence: confidence of the association. – method: which method produced the association – field: the name of the ESE field whose content suggests the similarity relation. – field_no: the position of the ESE field described above (useful in case the ESE records contains more than one field with the same name). • <paths:background_link>: which links the ESE record with an item from an external resource. The element has the following attributes: – source: the name of the external resource. – start_offset: the offset (in characters) within the field element where the text anchor begins. – end_offset: the offset (in characters) within the field element where the text anchor ends. – field: the field of the ESE record where the anchor for this relation is located. – confidence: confidence of the association. – method: which method produced the association. – title: title of the URL which the background link points to. – sentiment: polarity of the textual information included in the corresponding link. It has fixed values, namely “pos” for positive results, “neg” for negative and “neu” for neutral. Figure 1 shows an example of a CH record in ESEPaths. The first lines are just a copy of the original ESE record, whereas the new elements (in the paths namespace) are at the end. Note that identifiers (incl. URIs) are not real, and shortened so that the listing fits on the page. 3 Roadmap for basic conversion of ESEPaths to EDM As said before, all the data produced by the PATHS project is encoded following the ESE format extended with new elements. However, Europeana is switching from ESE to a new data model, EDM. The main difference between ESE and EDM is that the latter is more expressive and based on Semantic Web and Linked Data technologies (RDF, ontologies). In this section, we outline the main design we devise for switching from ESEPaths to EDM. The main design criteria we have followed is the following: 1. All PATHS annotations should be properly represented using EDM. 2. It must be possible to retrieve particular PATHS annotations. 3. We should depart as less as possible from standard EDM. 4
  • 18. The first criterion states that all PATHS annotations should be described using EDM. As will be shown below, some annotation attributes are difficult to represent following EDM and, as a consequence, a compromise has to be made between describing PATHS annotations in their full richness and using proper EDM concepts and properties for representing them. The second criterion states that the EDM representation has to respect the types of the PATHS annotations. For instance, it has to be possible to retrieve all background links of a particular CH item (as opposite as, say, its related items). Finally, the last criterion states that we should use widely used EDM objects and properties as possible. In particular, the EDM representation should use the set of elements described by Europeana’s instructions for providers4 , when possible. We now describe the main steps to describe the PATHS annotations to EDM. From ESEPaths to EDM We start describing the resources which are already in Europeana. This includes an Europeana ore:Aggregation resource with information about the digital aggregation process itself (provider, etc)5 . <http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:dataProvider "English Heritage - Viewfinder"; edm:provider "CultureGrid"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>; edm:rights <http://www.europeana.eu/rights/rr-f/>. Europeana also provides a proxy for the CHO, attached to this aggregation6 : <http://data.europeana.eu/proxy/provider/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://data.europeana.eu/aggregation/provider/09405/8F49>; # Original ESE data dc:creator "Davies, J O"; dc:date "[2001]"; dc:title "Stembridge Windmill, High Ham, Somerset"; dc:description "This is a random-coursed blue lias ...". We now describe the way to represent the enrichment annotations as provided by the PATHS project. We encapsulate these annotations into a new ore:Aggregation. This aggregation resource records a first set of enrichments created by the PATHS project over the original CH object. It includes all relevant information like provider name, access rights, etc. as well as the annotations referring to the whole CH object, as opposed to enrichment information extracted from some subset of the CH object’s metadata. 4 http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders The resource identifier of the aggregation used in the example is not real. The real one should be provided by Europeana. 6 Note again that the resource identifier of the proxy used in the example is not real. 5 5
  • 19. <http://www.paths-project.eu/aggregation/paths/09405/8F49> a ore:Aggregation edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:provider "PATHS"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:rights <http://www.paths-project.eu/rights/rr-f/>; # item informativeness paths:informativeness "0.7". There are some notes to be aware of: • The isShownAt property points to the original record, as the PATHS project does not store any information besides the proper enrichment of CH items. • The edm:rights property refers to the annotated information (instead of the rights of the original CH item). • As said before, the paths:informativeness element pertains to the PATHS aggregation resource because it refers to the CH object as a whole. Finally, we create a proxy resource for the PATHS aggregation and describe the remaining paths annotations within the scope (as properties) of this resource: <http://www.paths-project.eu/proxy/paths/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/paths/09405/8F49> # vocabulary mapping edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>; # events edm:isRelatedTo <http://www.paths-project.eu/event/playing>; # related items edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>; # background links items edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>. # Or <http://dbpedia.org/resource/Archaeology> Representing various types of enrichment. As shown in the example, the proxy resource relates the CH item with external resources such as vocabulary concepts, events, related items or objects from some external sources (such as Wikipedia or dbpedia). As all the associations are described by means of the high-level edm:isRelatedTo property, it is necessary to properly declare the types of the external objects related to the CH object. Otherwise, there would be no way to discriminate among the different types of PATHS annotations (for instance, there would be no way to specifically retrieve the vocabulary concepts related to a CH object). As a first solution, we can include a separate description for the resources linked to the CH object using SKOS7 . Within PATHS we define the following types of external resources: • Related CH items: are of type paths:RelatedItemConcept, which is in turn a subclass of skos:Concept. 7 • Vocabulary concepts are of type skos:Concept. http://www.w3.org/2004/02/skos 6
  • 20. • Events are of type paths:EventConcept, a subclass of skos:Concept. It represents any concept which refers to a (type of) event (such as “run”, “play”, etc). • Background links are of type paths:BackgroundLinkConcept, a subclass of skos:Concept. Note that these classes are meant to offer a way to discriminate among the different types of annotations inside the PATHS project. The classes are therefore loosely defined, in the sense that they do not describe the proper semantic type of the resources. For instance, PATHS can relate a CH object with a dbpedia resource representing a place (New_York), a person (Pablo_Picasso), etc. However, within the scope of the PATHS annotations, the only explicit common type for all those resources can be inherited from their “background link” status. Also note that at the time being, Europeana would not be able to perfectly ingest data that uses such sub-classes, as they depart from the set of elements described by Europeana’s instructions for providers8 . This would require Europeana to handle specialisations of EDM, which is not precisely scheduled at the time of writing. Based on the above, we also include the following statements in the example: <http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept; skos:prefLabel "Tower Mills"@en. <http://www.paths-project.eu/event/playing> a paths:EventConcept; skos:prefLabel "playing"@en. <http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept. <http://en.wikipedia.org/wiki/Archaeology> a paths:BackgroundLinkConcept; skos:prefLabel "Archeology"@en. along with the definitions of these new types: paths:EventConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Event Concept"@en ; skos:definition "A concept describing an Event"@en . paths:RelatedItemConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Related Item Concept"@en ; skos:definition "A concept describing a CH record"@en . paths:BackgroundLinkConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Background Link Concept"@en ; skos:definition "A concept describing an object from an external source such as dbpedia"@en . The above definitions can be put next to the annotation data, in a separate file directly provided to Europeana or others, or even served over the Web in a Linked Data scenario. The whole EDM representation for the item is shown in Figure 2. 8 http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders 7
  • 21. <http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:dataProvider "English Heritage - Viewfinder"; edm:provider "CultureGrid"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>; edm:rights <http://www.europeana.eu/rights/rr-f/>. <http://data.europeana.eu/proxy/europeana/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49>; # Existing ESE record dc:creator "Davies, J O"; dc:date "[2001]"; dc:title "Stembridge Windmill, High Ham, Somerset"; dc:description "This is a random-coursed blue lias ...". <http://www.paths-project.eu/aggregation/europeana/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:provider "PATHS"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:rights <http://www.paths-project.eu/rights/rr-f/>; # item informativeness paths:informativeness "0.7". <http://www.paths-project.eu/proxy/europeana/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49> # vocabulary mapping edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>; # events edm:isRelatedTo <http://www.paths-project.eu/event/playing>; # related items edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>; # background links items edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>. # Or <http://dbpedia.org/resource/Archaeology> <http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept; skos:prefLabel "Tower Mills"@en. <http://www.paths-project.eu/event/playing> a paths:EventConcept; skos:prefLabel "playing"@en. <http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept. Figure 2: EDM representation of the ESEPaths example 8
  • 22. Using specific metadata fields to represent enrichments Alternatively, if a PATHS enrichment is known to be certain, a new metadata field can be created for the CH object. For instance if the mapping of the CH record to a vocabulary concept is known to be sure, we can create a new dc:subject field linking the CH record with the appropriate vocabulary concept. Note however that PATHS enrichments are automatically performed, and it is not certain that a concept enrichment derived from a dc:subject would result in a dc:subject relation between the object and the concept. The link to the concept may have been identified based on only a small part of the original field, thus missing some of the original semantics. Thus some manual assessment has to be done in order to promote the annotation into a proper metadata field. 4 Using Open Annotation to represent attributes in relations The roadmap described in the previous section covers the main aspects of ESEPaths. However, there is a first piece of ESEPaths data, which can not be easily represented in EDM as it inherits RDF’s focus on binary relations: attributes on relations. Almost all annotations created by the PATHS project have some information associated to them. Especially, many annotations record a confidence value, describing the level of certainty of the automatic method when creating the annotation. A way to overcome this limitation in an RDF-based model would be to reify the annotation into an instance of a dedicated class, and represent the annotation attributes using class properties. For this we can re-use elements from the Open Annotation (OA) model9 . Consider this ESEPaths snippet: <record> ... <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> <paths:background_link source="wikipedia" start_offset="0" end_offset="11" field="dc:subject" confidence="0.015" method="wikipedia-miner-1.2.0" title="Archaeology"> http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> We would create the following oa:Annotation for it: background_link1 a oa:Annotation ; a paths:BackgroundLinkAnnotation ; oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ; #Or <http://dbpedia.org/resource/Archaeology> paths:source <http://en.wikipedia.org> ; #Or <dbpedia.org> paths:confidence "0.015" . In the example, the <paths:background_link> annotation has been converted (reified) to an oa:Annotation resource background_link_resource1 of type paths:BackgroundLinkAnnotation, 9 http://www.openannotation.org/spec/core/ 9
  • 23. linked by the oa:hasTarget relation to the PATHS proxy resource. The attributes of the original relation are now represented as properties of this new resource. An alternative of the above approach would be using the OA “motivation” property for representing the annotation. The OA motivation is meant to represent “the reasons why the Annotation was created, not just the agents involved” 10 , which fits particularly well with the kind of information we want to represent. The “motivation” approach would lead to the following triplets: background_link1 a oa:Annotation ; oa:motivatedBy paths:backgroundLinkMotivation ; oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ; #Or <http://dbpedia.org/resource/Archaeology> paths:source <http://en.wikipedia.org> ; #Or <dbpedia.org> paths:confidence "0.015" . In this case, the <paths:background_link> object is of type oa:Annotation, and it is also oa:motivatedBy a paths:backgroundLinkMotivation, an instance of skos:Concept. Both approaches described so far solve the main problem of attaching attributes to relations, and also the need of defining specific relations for PATHS such as paths:background_link, that would conflict with the metadata fields currently used by EDM. Note however that the properties of the newly defined reified annotations are still specific for PATHS (paths:source, paths:confidence, etc). On a side note, using reified concepts for annotation raises the issue of whether we should still keep the proxy-based representation next to it. Because now all the PATHS enrichment data is attached to the reified annotation, the Proxy object described in Section 3 will convey little or no information at all, compared to the original data. 4.1 Offsets and selectors There is another piece of ESEPaths data, which is not currently represented in EDMPaths, namely, the field and offset attributes of the relations. Because all PATHS annotations are extracted from the textual content of some metadata field in the original CH record representation, ESEPaths annotations keeps track of the original text snippet (called the anchor ) which was used to derive the enrichment. In order to track this kind of provenance information, EDM could re-use the selectors from the Open Annotation model11 . For instance, Consider the following ESEPaths snippet: <record> ... <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> ... <paths:background_link start_offset="0" end_offset="11" field="dc:subject" ... > http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> 10 11 http://www.openannotation.org/spec/core/core.html#Motivations http://www.openannotation.org/spec/core/specific.html#Selectors 10
  • 24. It describes an “background link” annotation for the CH object “09405/8F49” which was extracted by analyzing the offsets 0-11of the dc:subject of the original record. These offsets could be translated to the following Open Annotation snippet: background_link1 a oa:Annotation ; oa:hasTarget anchor1 ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> . anchor1 a oa:SpecificResource ; oa:hasSource ??? ; # which type has this object ? oa:hasSelector selector1 . selector1 a oa:TextPositionSelector ; oa:start 0 ; oa:end 11 . As noted in the snippet, our problem is then to define the type of the anchor1 resource. This object should represent the dc:subject field of CH record “09405/8F49”, but there is actually no way to describe this with EDM. We thus decided to leave this piece of information out of our proposed solution. 5 Conclusion In this work we describe a method for representing automatically created PATHS annotations into the EDM model. We first describe a simple way for representing the annotations and discuss its benefits and drawbacks. One important weakness of the simple annotation schema lies in its inability to represent attributes of annotations, such as confidence scores. To overcome this limitation we propose a more complex solution that involves reifing the annotation properties as instances of dedicated classes, and representing the annotation attributes using class properties. For this we have re-used elements from the Open Annotation (OA) model. The method presented here, called EDMPaths, is able to properly represent the annotations following EDM, but some information which was previously present following ESE has been left out. In particular, information regarding the particular offset of the anchor that caused the annotation was produced has proven difficult to represent. One of our main design goals has been to avoid creating new non-standard classes and properties when defining EDMPaths. We think we have succeed on this particular aspect, mainly by reusing elements from initiatives such as the Open Annotation model. However, the proposal describes some properties which are still specific for the PATHS project. References [Agirre and de Lacalle, 2011] Agirre, E. and de Lacalle, O. L. (2011). D2.1: Processing and representation of content for first prototype. Technical report, PATHS project. [Doerr et al., 2010] Doerr, M., Gradmann, S., Hennicke, S., Isaac, A., Meghini, C., and van de Sompel, H. (2010). The europeana data model (EDM). In World Library and Information Congress: 76th IFLA general conference and assembly, pages 10–15. 11
  • 25. [Otegi et al., 2012] Otegi, A., Agirre, E., and Soroa, A. (2012). D2.2: Processing and representation of content for second prototype. Technical report, PATHS project. 12