SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
PATHSenrich: A Web Service Prototype
for Automatic Cultural Heritage Item
Enrichment
Eneko Agirre, Ander Barrena, Kike Fernandez, Esther Miranda,
Arantxa Otegi, and Aitor Soroa
IXA NLP Group, University of the Basque Country UPV/EHU
arantza.otegi@ehu.es
Abstract. Large amounts of cultural heritage material are nowadays
available through online digital library portals. Most of these cultural
items have short descriptions and lack rich contextual information. The
PATHS project has developed experimental enrichment services. As a
proof of concept, this paper presents a web service prototype which allows
independent content providers to enrich cultural heritage items with a
subset of the full functionality: links to related items in the collection
and links to related Wikipedia articles. In the future we plan to provide
more advanced functionality, as available offline for PATHS.

1

Introduction

Large amounts of cultural heritage (CH) material are now available through
online digital library portals, such as Europeana1. Europeana hosts millions of
books, paintings, films, museum objects and archival records that have been digitised throughout Europe. Europeana collects contextual information or metadata
about different types of content, which the users can use for their searches.
The main strength of Europeana lays in the vast number of items it contains.
Sometimes, though, this quantity comes at the cost of a restricted amount of
metadata, with many items having very short descriptions and a lack of rich
contextual information. One of the goals of the PATHS project2 is precisely to
enrich CH items, using a selected subset of Europeana as a testbed[1].
Whithin the project, this enrichment will make possible to create a system
that acts as an interactive personalised tour guide through Europeana collections, offering suggestions about items to look at and assist in their interpretation by providing relevant contextual information from related items within
Europeana and items from external sources like Wikipedia. Users of such digital
libraries may require information for purposes such as learning and seeking answers to questions. This additional information supports users in fulfilling their
information need, as the evaluation of the first PATHS prototype shows [2].
In this paper we present a web service prototype which allows independent
content providers to enrich CH items. Specifically, the service enriches the items
1
2

http://www.europeana.eu/portal/
http://www.paths-project.eu

T. Aalberg et al. (Eds.): TPDL 2013, LNCS 8092, pp. 462–465, 2013.
c Springer-Verlag Berlin Heidelberg 2013
PATHSenrich: A Web Service Prototype for Automatic CH Item Enrichment

463

with two types of information. On the one hand, the item will be linked to
similar items within the collection. On the other hand, the item will be linked
to Wikipedia articles which are related to it.
There have been many attempts to automatically enrich cultural heritage
metadata. Some projects (for instance, MIMO-DB3 or MERLIN4 ) relate CH
objects with terms of an external authority or vocabulary. Some others (like
MACE5 or YUMA 6 ) adopt a collaborative annotation paradigm for metadata
enrichment. To our knowledge, PATHS is the first project using semantic NLP
processing to link CH items to similar items or external Wikipedia articles.
The current service has limited bandwidth, and provides a selected subset
of the enrichment functionality available internally in the PATHS project. The
quality of the links produce is also slightly lower, although we plan to improve it
in the short future. However, we think that the prototype is useful to demonstrate
the potential to construct a web service for automatically enriching CH items
with high quality information.

2

Demo Description

The web service takes as input one CH item represented following the Europeana
Data Model (EDM) in JSON format, as exported by the Europeana API v2.07 (a
sample record is provided in the interface). The web service returns the following:
– A list of 10 closely related items within the collection.
– A list of Wikipedia pages which are related to the target item.
Figure 1 shows a snapshot of the web service. The service is publicly accessible
following the URL http://ixa2.si.ehu.es/paths_wp2/paths_wp2.pl.
The enrichment is performed by analyzing the metadata associated with the
item, i.e., the title of the item, its description, etc. The next sections briefly
describe how this enrichment is performed.
2.1

Related Items within the Collection

The list of related items is obtained by first creating a query with the content
of the title, subject and description fields (stopwords are removed). The query
is then posted to a SOLR search engine8 . The SOLR search engine accesses an
index created with the subset of Europeana items already enriched offline within
the PATHS project. In that way, the most related Europeana items in the subset
are obtained, and the identifiers of those related items are listed. Note that the
related items used internally in the PATHS project are produced using more
sophisticated methods. Please refer to [1] for further details.
3
4
5
6
7
8

http://www.mimo-international.com
http://www.ucl.ac.uk/ls/merlin
http://www.mace-project.eu
http://dme.ait.ac.at/annotation
http://preview.europeana.eu/portal/api-introduction.html
http://lucene.apache.org/solr/
464

E. Agirre et al.

Fig. 1. Web service interface. It consists of a text area to introduce the input item
in JSON format (top). The “Get EDM JSON example” button can be used to get an
input example. Once a JSON record is typed, click “Process” button to get the output.
The output (bottom) consists on a list of related items and background links.

2.2

Related Wikipedia Articles

For linking the items to Wikipedia articles we follow an implementation similar
to the method described in [3]. This method creates a dictionary, an association
between string mentions with all possible articles the mention can refer to. Our
dictionary is constructed using the title of the Wikipedia article, the redirect
pages, the disambiguation pages and the anchor texts from Wikipedia links.
Mentions are lower-cased and all text between parenthesis is removed. If the
mention links to a disambiguation page, it is associated with all possible articles
the disambiguation page points to. Besides, each association between a mention
and article is scored with the prior probability, estimated as the number of
times that the mention occurs in the anchor text of an article. Note that such
dictionaries can disambiguate any mention, just returning the highest-scoring
article for this particular mention.
Once the dictionary is built, the web service analyzes the title, subject and
description fields of the CH item and matches the longest substring within those
fields with entries in the dictionary. When a match is found, the Wikipedia article
with highest score for this entry is returned. Note that the links to Wikipedia
in the PATHS project are produced using more sophisticated methods. Please
refer to [1] for further details.
PATHSenrich: A Web Service Prototype for Automatic CH Item Enrichment

3

465

Conclusions and Future Work

This paper presents a web service prototype which automatically enriches CH
items with metadata. The web service is inspired in the enrichment work carried
out in the PATHS project, but, contrary to the batch methodology used in the
project, this enrichment is performed online. The prototype has been designed
for demonstration purposes, to showcase the feasibility of providing full-fledged
automatic enrichment.
Our plans for the future include moving the offline enrichment services which
are currently being evaluated in the PATHS project to the web service. In the
case of related Wikipedia articles, we will take into account the context of the
matched entities, which improves the quality of the links [4], and we will include
a filtering algorithm to discard entities that are not relevant. Regarding related
items, we will classify them according to the type of relation [5]. In addition we
plan to automatically organize the items hierarchically, according to a Wikipediabased vocabulary [6].
Acknowledgements. The research leading to these results was carried out as
part of the PATHS project (http://www.paths-project.eu) funded by European Communitys Seventh Framework Programme (FP7/2007- 2013) under
grant agreement no. 270082. The work has been also funded by the Basque
Government (project IBILBIDE, SAIOTEK S-PE12UN089).

References
1. Otegi, A., Agirre, E., Soroa, A., Aletras, N., Chandrinos, C., Fernando, S., GonzalezAgirre, A.: Report accompanying D2.2: Processing and Representation of Content
for Second Prototype. PATHS Project Deliverable (2012),
http://www.paths-project.eu/eng/content/download/2489/18113/version/2/
file/D2.2.Content+Processing-2nd+Prototype-revised.v2.pdf
2. Griffiths, J., Goodale, P., Minelli, S., de Polo, A., Agerri, R., Soroa, A., Hall, M.,
Bergheim, S.R., Chandrinos, K., Chryssochoidis, G., Fernie, K., Usher, T.: D5.1:
Evaluation of the first PATHS prototype. PATHS Project Deliverable (2012),
http://www.paths-project.eu/eng/Resources/
D5.1-Evaluation-of-the-1st-PATHS-Prototype
3. Chang, A.X., Spitkovsky, V.I., Yeh, E., Agirre, E., Manning, C.D.: Stanford-UBC
entity linking at TAC-KBP. In: Proceedings of TAC 2010, Gaithersburg, Maryland,
USA (2010)
4. Han, X., Sun, L.: A Generative Entity-Mention Model for Linking Entities with
Knowledge Base. In: Proceedings of the ACL, Portland, Oregon, USA (2011)
5. Agirre, E., Aletras, N., Gonzalez-Agirre, A., Rigau, G., Stevenson, M.: UBC UOSTYPED: Regression for typed-similarity. In: Second Joint Conference on Lexical
and Computational Semantics (*SEM), Atlanta, Georgia, USA (2013)
6. Fernando, S., Hall, M., Agirre, E., Soroa, A., Clough, P., Stevenson, M.: Comparing Taxonomies for Organising Collections of Documents. In: Proceedings of
COLING 2012, Mumbai, India (2013)

Weitere ähnliche Inhalte

Was ist angesagt?

Cluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCSCJournals
 
Object models and object representation
Object models and object representationObject models and object representation
Object models and object representationJulie Allinson
 
Annotations chicago
Annotations chicagoAnnotations chicago
Annotations chicagoTimothy Cole
 
Linked Data as a new environment for Learning Analytics and education
Linked Data as a new environment  for Learning Analytics and educationLinked Data as a new environment  for Learning Analytics and education
Linked Data as a new environment for Learning Analytics and educationMathieu d'Aquin
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Getaneh Alemu
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesCarl Hess
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
Metadata enriching and filtering for enhanced collection discoverability
Metadata enriching and filtering for enhanced collection discoverability  Metadata enriching and filtering for enhanced collection discoverability
Metadata enriching and filtering for enhanced collection discoverability Getaneh Alemu
 
Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaEUCLID project
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit IIpkaviya
 
Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Getaneh Alemu
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked DataEUCLID project
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities Getaneh Alemu
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data ApplicationsEUCLID project
 
Linked Data for African Libraries
Linked Data for African LibrariesLinked Data for African Libraries
Linked Data for African LibrariesGetaneh Alemu
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 

Was ist angesagt? (20)

Cluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector Machine
 
Ji cv6n2
Ji cv6n2Ji cv6n2
Ji cv6n2
 
Object models and object representation
Object models and object representationObject models and object representation
Object models and object representation
 
Annotations chicago
Annotations chicagoAnnotations chicago
Annotations chicago
 
Linked Data as a new environment for Learning Analytics and education
Linked Data as a new environment  for Learning Analytics and educationLinked Data as a new environment  for Learning Analytics and education
Linked Data as a new environment for Learning Analytics and education
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
LIBRIS - Linked Library Data
LIBRIS - Linked Library DataLIBRIS - Linked Library Data
LIBRIS - Linked Library Data
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Metadata enriching and filtering for enhanced collection discoverability
Metadata enriching and filtering for enhanced collection discoverability  Metadata enriching and filtering for enhanced collection discoverability
Metadata enriching and filtering for enhanced collection discoverability
 
Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training Curricula
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Linked library data
Linked library dataLinked library data
Linked library data
 
Linked Data for African Libraries
Linked Data for African LibrariesLinked Data for African Libraries
Linked Data for African Libraries
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 

Andere mochten auch

Величко М.В. (2014.02.26) — О Майдане и перспективах Украины и России
Величко М.В. (2014.02.26) — О Майдане и перспективах Украины и РоссииВеличко М.В. (2014.02.26) — О Майдане и перспективах Украины и России
Величко М.В. (2014.02.26) — О Майдане и перспективах Украины и Россииmediamera
 
презентация:)
презентация:)презентация:)
презентация:)ILgizmironov
 
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
Exchange In-Place eDiscovery & Hold | Introduction  | 5#7Exchange In-Place eDiscovery & Hold | Introduction  | 5#7
Exchange In-Place eDiscovery & Hold | Introduction | 5#7Eyal Doron
 
The old exchange environment versus modern exchange environment part 02#36
The old exchange environment versus modern exchange environment  part 02#36The old exchange environment versus modern exchange environment  part 02#36
The old exchange environment versus modern exchange environment part 02#36Eyal Doron
 
My E-mail appears as spam - Troubleshooting path | Part 11#17
My E-mail appears as spam - Troubleshooting path | Part 11#17My E-mail appears as spam - Troubleshooting path | Part 11#17
My E-mail appears as spam - Troubleshooting path | Part 11#17Eyal Doron
 
Feg chapter 04 - present perfect azar
Feg chapter 04 - present perfect azarFeg chapter 04 - present perfect azar
Feg chapter 04 - present perfect azarmacbridesmith
 
How does sender verification work how we identify spoof mail) spf, dkim dmar...
How does sender verification work  how we identify spoof mail) spf, dkim dmar...How does sender verification work  how we identify spoof mail) spf, dkim dmar...
How does sender verification work how we identify spoof mail) spf, dkim dmar...Eyal Doron
 
IND-2012-255 PUPS Subramaniapuram, Tenkasi -Pioneer to save Earthworm and use...
IND-2012-255 PUPS Subramaniapuram, Tenkasi -Pioneer to save Earthworm and use...IND-2012-255 PUPS Subramaniapuram, Tenkasi -Pioneer to save Earthworm and use...
IND-2012-255 PUPS Subramaniapuram, Tenkasi -Pioneer to save Earthworm and use...designforchangechallenge
 
IND-2012-300 Mother's Pet Kindergarten Nagpur - A U trurn for traffic Rules
IND-2012-300 Mother's Pet Kindergarten Nagpur - A U trurn for traffic RulesIND-2012-300 Mother's Pet Kindergarten Nagpur - A U trurn for traffic Rules
IND-2012-300 Mother's Pet Kindergarten Nagpur - A U trurn for traffic Rulesdesignforchangechallenge
 

Andere mochten auch (11)

Величко М.В. (2014.02.26) — О Майдане и перспективах Украины и России
Величко М.В. (2014.02.26) — О Майдане и перспективах Украины и РоссииВеличко М.В. (2014.02.26) — О Майдане и перспективах Украины и России
Величко М.В. (2014.02.26) — О Майдане и перспективах Украины и России
 
TAM-2012-07 R C PS Malayakulam -
TAM-2012-07 R C PS Malayakulam -TAM-2012-07 R C PS Malayakulam -
TAM-2012-07 R C PS Malayakulam -
 
GUJ-2012-12 Fazalpur Prathmik Shala No 1
GUJ-2012-12 Fazalpur Prathmik Shala No 1 GUJ-2012-12 Fazalpur Prathmik Shala No 1
GUJ-2012-12 Fazalpur Prathmik Shala No 1
 
презентация:)
презентация:)презентация:)
презентация:)
 
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
Exchange In-Place eDiscovery & Hold | Introduction  | 5#7Exchange In-Place eDiscovery & Hold | Introduction  | 5#7
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
 
The old exchange environment versus modern exchange environment part 02#36
The old exchange environment versus modern exchange environment  part 02#36The old exchange environment versus modern exchange environment  part 02#36
The old exchange environment versus modern exchange environment part 02#36
 
My E-mail appears as spam - Troubleshooting path | Part 11#17
My E-mail appears as spam - Troubleshooting path | Part 11#17My E-mail appears as spam - Troubleshooting path | Part 11#17
My E-mail appears as spam - Troubleshooting path | Part 11#17
 
Feg chapter 04 - present perfect azar
Feg chapter 04 - present perfect azarFeg chapter 04 - present perfect azar
Feg chapter 04 - present perfect azar
 
How does sender verification work how we identify spoof mail) spf, dkim dmar...
How does sender verification work  how we identify spoof mail) spf, dkim dmar...How does sender verification work  how we identify spoof mail) spf, dkim dmar...
How does sender verification work how we identify spoof mail) spf, dkim dmar...
 
IND-2012-255 PUPS Subramaniapuram, Tenkasi -Pioneer to save Earthworm and use...
IND-2012-255 PUPS Subramaniapuram, Tenkasi -Pioneer to save Earthworm and use...IND-2012-255 PUPS Subramaniapuram, Tenkasi -Pioneer to save Earthworm and use...
IND-2012-255 PUPS Subramaniapuram, Tenkasi -Pioneer to save Earthworm and use...
 
IND-2012-300 Mother's Pet Kindergarten Nagpur - A U trurn for traffic Rules
IND-2012-300 Mother's Pet Kindergarten Nagpur - A U trurn for traffic RulesIND-2012-300 Mother's Pet Kindergarten Nagpur - A U trurn for traffic Rules
IND-2012-300 Mother's Pet Kindergarten Nagpur - A U trurn for traffic Rules
 

Ähnlich wie PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment, @TPDL 2013

Of Cataloging & Context
Of Cataloging & ContextOf Cataloging & Context
Of Cataloging & Contextcharper
 
Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...pathsproject
 
EuropeanaConnect - Enhancing User Access to European Digital Heritage
EuropeanaConnect - Enhancing User Access to European Digital HeritageEuropeanaConnect - Enhancing User Access to European Digital Heritage
EuropeanaConnect - Enhancing User Access to European Digital HeritageMax Kaiser
 
Portrait Of Europeana As An Api
Portrait Of Europeana As An ApiPortrait Of Europeana As An Api
Portrait Of Europeana As An ApiEuropeana
 
EuropeanaLocal: what’s it all about?
EuropeanaLocal: what’s it all about?EuropeanaLocal: what’s it all about?
EuropeanaLocal: what’s it all about?EuropeanaLocal Project
 
Europeana Creative. EDM Endpoint. Custom Views
Europeana Creative. EDM Endpoint. Custom ViewsEuropeana Creative. EDM Endpoint. Custom Views
Europeana Creative. EDM Endpoint. Custom ViewsVladimir Alexiev, PhD, PMP
 
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, SwedenSem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, SwedenVladimir Alexiev, PhD, PMP
 
77. newsletter d andrea2012
77. newsletter d andrea201277. newsletter d andrea2012
77. newsletter d andrea2012Andrea D'Andrea
 
Europeana Connect All-Staff Meeting
Europeana Connect All-Staff MeetingEuropeana Connect All-Staff Meeting
Europeana Connect All-Staff MeetingEuropeanaConnect
 
Enhancing scholarly publishing, jankowski, tatum, tatum, & scharnhorst, pkp c...
Enhancing scholarly publishing, jankowski, tatum, tatum, & scharnhorst, pkp c...Enhancing scholarly publishing, jankowski, tatum, tatum, & scharnhorst, pkp c...
Enhancing scholarly publishing, jankowski, tatum, tatum, & scharnhorst, pkp c...Nick Jankowski
 
LoCloud - D3.3: Metadata Enrichment services
LoCloud - D3.3: Metadata Enrichment servicesLoCloud - D3.3: Metadata Enrichment services
LoCloud - D3.3: Metadata Enrichment serviceslocloud
 
Europeana vision - Web as Literature 2013
Europeana vision - Web as Literature 2013Europeana vision - Web as Literature 2013
Europeana vision - Web as Literature 2013Antoine Isaac
 
Case Study: Europeana API Implementation in Polish Digital Libraries
Case Study: Europeana API Implementation in Polish Digital LibrariesCase Study: Europeana API Implementation in Polish Digital Libraries
Case Study: Europeana API Implementation in Polish Digital LibrariesNeil Bates
 
Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104Asa Letourneau
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
Institutional Services and Tools for Content, Metadata and IPR Management
Institutional Services and Tools for Content, Metadata and IPR ManagementInstitutional Services and Tools for Content, Metadata and IPR Management
Institutional Services and Tools for Content, Metadata and IPR ManagementPaolo Nesi
 
AAC Education Session
AAC Education Session AAC Education Session
AAC Education Session Antoine Isaac
 

Ähnlich wie PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment, @TPDL 2013 (20)

Of Cataloging & Context
Of Cataloging & ContextOf Cataloging & Context
Of Cataloging & Context
 
Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...
 
The European Portal for documents and Archives: the APEnet Project
The European Portal for documents and Archives: the APEnet ProjectThe European Portal for documents and Archives: the APEnet Project
The European Portal for documents and Archives: the APEnet Project
 
EuropeanaConnect - Enhancing User Access to European Digital Heritage
EuropeanaConnect - Enhancing User Access to European Digital HeritageEuropeanaConnect - Enhancing User Access to European Digital Heritage
EuropeanaConnect - Enhancing User Access to European Digital Heritage
 
Portrait Of Europeana As An Api
Portrait Of Europeana As An ApiPortrait Of Europeana As An Api
Portrait Of Europeana As An Api
 
EuropeanaLocal: what’s it all about?
EuropeanaLocal: what’s it all about?EuropeanaLocal: what’s it all about?
EuropeanaLocal: what’s it all about?
 
Europeana Creative. EDM Endpoint. Custom Views
Europeana Creative. EDM Endpoint. Custom ViewsEuropeana Creative. EDM Endpoint. Custom Views
Europeana Creative. EDM Endpoint. Custom Views
 
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, SwedenSem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
 
77. newsletter d andrea2012
77. newsletter d andrea201277. newsletter d andrea2012
77. newsletter d andrea2012
 
Europeana Connect All-Staff Meeting
Europeana Connect All-Staff MeetingEuropeana Connect All-Staff Meeting
Europeana Connect All-Staff Meeting
 
Enhancing scholarly publishing, jankowski, tatum, tatum, & scharnhorst, pkp c...
Enhancing scholarly publishing, jankowski, tatum, tatum, & scharnhorst, pkp c...Enhancing scholarly publishing, jankowski, tatum, tatum, & scharnhorst, pkp c...
Enhancing scholarly publishing, jankowski, tatum, tatum, & scharnhorst, pkp c...
 
LoCloud - D3.3: Metadata Enrichment services
LoCloud - D3.3: Metadata Enrichment servicesLoCloud - D3.3: Metadata Enrichment services
LoCloud - D3.3: Metadata Enrichment services
 
Europeana vision - Web as Literature 2013
Europeana vision - Web as Literature 2013Europeana vision - Web as Literature 2013
Europeana vision - Web as Literature 2013
 
Europeana and Researchers
Europeana and ResearchersEuropeana and Researchers
Europeana and Researchers
 
Case Study: Europeana API Implementation in Polish Digital Libraries
Case Study: Europeana API Implementation in Polish Digital LibrariesCase Study: Europeana API Implementation in Polish Digital Libraries
Case Study: Europeana API Implementation in Polish Digital Libraries
 
Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
Citizen Science Open Data
Citizen Science Open DataCitizen Science Open Data
Citizen Science Open Data
 
Institutional Services and Tools for Content, Metadata and IPR Management
Institutional Services and Tools for Content, Metadata and IPR ManagementInstitutional Services and Tools for Content, Metadata and IPR Management
Institutional Services and Tools for Content, Metadata and IPR Management
 
AAC Education Session
AAC Education Session AAC Education Session
AAC Education Session
 

Mehr von pathsproject

Generating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperGenerating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperpathsproject
 
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...pathsproject
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring reportpathsproject
 
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperGenerating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperpathsproject
 
PATHS @ LATECH 2013
PATHS @ LATECH 2013PATHS @ LATECH 2013
PATHS @ LATECH 2013pathsproject
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conferencepathsproject
 
PATHS at the EAA conference 2013
PATHS at the EAA conference 2013PATHS at the EAA conference 2013
PATHS at the EAA conference 2013pathsproject
 
PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013pathsproject
 
Comparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationComparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationpathsproject
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similaritypathsproject
 
Comparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentsComparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentspathsproject
 
PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0pathsproject
 
PATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototypePATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototypepathsproject
 
PATHS Second prototype-functional-spec
PATHS Second prototype-functional-specPATHS Second prototype-functional-spec
PATHS Second prototype-functional-specpathsproject
 
PATHS Final state of art monitoring report v0_4
PATHS  Final state of art monitoring report v0_4PATHS  Final state of art monitoring report v0_4
PATHS Final state of art monitoring report v0_4pathsproject
 
PATHS first paths prototype
PATHS first paths prototypePATHS first paths prototype
PATHS first paths prototypepathsproject
 
PATHS Content processing 2nd prototype-revised.v2
PATHS Content processing 2nd prototype-revised.v2PATHS Content processing 2nd prototype-revised.v2
PATHS Content processing 2nd prototype-revised.v2pathsproject
 
PATHS Content processing 1st prototype
PATHS  Content processing 1st prototypePATHS  Content processing 1st prototype
PATHS Content processing 1st prototypepathsproject
 
PATHS system architecture
PATHS system architecturePATHS system architecture
PATHS system architecturepathsproject
 

Mehr von pathsproject (20)

Generating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperGenerating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paper
 
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring report
 
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperGenerating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
 
PATHS @ LATECH 2013
PATHS @ LATECH 2013PATHS @ LATECH 2013
PATHS @ LATECH 2013
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conference
 
PATHS at the EAA conference 2013
PATHS at the EAA conference 2013PATHS at the EAA conference 2013
PATHS at the EAA conference 2013
 
PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013
 
Comparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationComparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentation
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similarity
 
Comparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentsComparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documents
 
PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0
 
PATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototypePATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototype
 
PATHS Second prototype-functional-spec
PATHS Second prototype-functional-specPATHS Second prototype-functional-spec
PATHS Second prototype-functional-spec
 
PATHS Final state of art monitoring report v0_4
PATHS  Final state of art monitoring report v0_4PATHS  Final state of art monitoring report v0_4
PATHS Final state of art monitoring report v0_4
 
PATHS first paths prototype
PATHS first paths prototypePATHS first paths prototype
PATHS first paths prototype
 
PATHS Content processing 2nd prototype-revised.v2
PATHS Content processing 2nd prototype-revised.v2PATHS Content processing 2nd prototype-revised.v2
PATHS Content processing 2nd prototype-revised.v2
 
PATHS Content processing 1st prototype
PATHS  Content processing 1st prototypePATHS  Content processing 1st prototype
PATHS Content processing 1st prototype
 
PATHS system architecture
PATHS system architecturePATHS system architecture
PATHS system architecture
 

Kürzlich hochgeladen

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Kürzlich hochgeladen (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment, @TPDL 2013

  • 1. PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment Eneko Agirre, Ander Barrena, Kike Fernandez, Esther Miranda, Arantxa Otegi, and Aitor Soroa IXA NLP Group, University of the Basque Country UPV/EHU arantza.otegi@ehu.es Abstract. Large amounts of cultural heritage material are nowadays available through online digital library portals. Most of these cultural items have short descriptions and lack rich contextual information. The PATHS project has developed experimental enrichment services. As a proof of concept, this paper presents a web service prototype which allows independent content providers to enrich cultural heritage items with a subset of the full functionality: links to related items in the collection and links to related Wikipedia articles. In the future we plan to provide more advanced functionality, as available offline for PATHS. 1 Introduction Large amounts of cultural heritage (CH) material are now available through online digital library portals, such as Europeana1. Europeana hosts millions of books, paintings, films, museum objects and archival records that have been digitised throughout Europe. Europeana collects contextual information or metadata about different types of content, which the users can use for their searches. The main strength of Europeana lays in the vast number of items it contains. Sometimes, though, this quantity comes at the cost of a restricted amount of metadata, with many items having very short descriptions and a lack of rich contextual information. One of the goals of the PATHS project2 is precisely to enrich CH items, using a selected subset of Europeana as a testbed[1]. Whithin the project, this enrichment will make possible to create a system that acts as an interactive personalised tour guide through Europeana collections, offering suggestions about items to look at and assist in their interpretation by providing relevant contextual information from related items within Europeana and items from external sources like Wikipedia. Users of such digital libraries may require information for purposes such as learning and seeking answers to questions. This additional information supports users in fulfilling their information need, as the evaluation of the first PATHS prototype shows [2]. In this paper we present a web service prototype which allows independent content providers to enrich CH items. Specifically, the service enriches the items 1 2 http://www.europeana.eu/portal/ http://www.paths-project.eu T. Aalberg et al. (Eds.): TPDL 2013, LNCS 8092, pp. 462–465, 2013. c Springer-Verlag Berlin Heidelberg 2013
  • 2. PATHSenrich: A Web Service Prototype for Automatic CH Item Enrichment 463 with two types of information. On the one hand, the item will be linked to similar items within the collection. On the other hand, the item will be linked to Wikipedia articles which are related to it. There have been many attempts to automatically enrich cultural heritage metadata. Some projects (for instance, MIMO-DB3 or MERLIN4 ) relate CH objects with terms of an external authority or vocabulary. Some others (like MACE5 or YUMA 6 ) adopt a collaborative annotation paradigm for metadata enrichment. To our knowledge, PATHS is the first project using semantic NLP processing to link CH items to similar items or external Wikipedia articles. The current service has limited bandwidth, and provides a selected subset of the enrichment functionality available internally in the PATHS project. The quality of the links produce is also slightly lower, although we plan to improve it in the short future. However, we think that the prototype is useful to demonstrate the potential to construct a web service for automatically enriching CH items with high quality information. 2 Demo Description The web service takes as input one CH item represented following the Europeana Data Model (EDM) in JSON format, as exported by the Europeana API v2.07 (a sample record is provided in the interface). The web service returns the following: – A list of 10 closely related items within the collection. – A list of Wikipedia pages which are related to the target item. Figure 1 shows a snapshot of the web service. The service is publicly accessible following the URL http://ixa2.si.ehu.es/paths_wp2/paths_wp2.pl. The enrichment is performed by analyzing the metadata associated with the item, i.e., the title of the item, its description, etc. The next sections briefly describe how this enrichment is performed. 2.1 Related Items within the Collection The list of related items is obtained by first creating a query with the content of the title, subject and description fields (stopwords are removed). The query is then posted to a SOLR search engine8 . The SOLR search engine accesses an index created with the subset of Europeana items already enriched offline within the PATHS project. In that way, the most related Europeana items in the subset are obtained, and the identifiers of those related items are listed. Note that the related items used internally in the PATHS project are produced using more sophisticated methods. Please refer to [1] for further details. 3 4 5 6 7 8 http://www.mimo-international.com http://www.ucl.ac.uk/ls/merlin http://www.mace-project.eu http://dme.ait.ac.at/annotation http://preview.europeana.eu/portal/api-introduction.html http://lucene.apache.org/solr/
  • 3. 464 E. Agirre et al. Fig. 1. Web service interface. It consists of a text area to introduce the input item in JSON format (top). The “Get EDM JSON example” button can be used to get an input example. Once a JSON record is typed, click “Process” button to get the output. The output (bottom) consists on a list of related items and background links. 2.2 Related Wikipedia Articles For linking the items to Wikipedia articles we follow an implementation similar to the method described in [3]. This method creates a dictionary, an association between string mentions with all possible articles the mention can refer to. Our dictionary is constructed using the title of the Wikipedia article, the redirect pages, the disambiguation pages and the anchor texts from Wikipedia links. Mentions are lower-cased and all text between parenthesis is removed. If the mention links to a disambiguation page, it is associated with all possible articles the disambiguation page points to. Besides, each association between a mention and article is scored with the prior probability, estimated as the number of times that the mention occurs in the anchor text of an article. Note that such dictionaries can disambiguate any mention, just returning the highest-scoring article for this particular mention. Once the dictionary is built, the web service analyzes the title, subject and description fields of the CH item and matches the longest substring within those fields with entries in the dictionary. When a match is found, the Wikipedia article with highest score for this entry is returned. Note that the links to Wikipedia in the PATHS project are produced using more sophisticated methods. Please refer to [1] for further details.
  • 4. PATHSenrich: A Web Service Prototype for Automatic CH Item Enrichment 3 465 Conclusions and Future Work This paper presents a web service prototype which automatically enriches CH items with metadata. The web service is inspired in the enrichment work carried out in the PATHS project, but, contrary to the batch methodology used in the project, this enrichment is performed online. The prototype has been designed for demonstration purposes, to showcase the feasibility of providing full-fledged automatic enrichment. Our plans for the future include moving the offline enrichment services which are currently being evaluated in the PATHS project to the web service. In the case of related Wikipedia articles, we will take into account the context of the matched entities, which improves the quality of the links [4], and we will include a filtering algorithm to discard entities that are not relevant. Regarding related items, we will classify them according to the type of relation [5]. In addition we plan to automatically organize the items hierarchically, according to a Wikipediabased vocabulary [6]. Acknowledgements. The research leading to these results was carried out as part of the PATHS project (http://www.paths-project.eu) funded by European Communitys Seventh Framework Programme (FP7/2007- 2013) under grant agreement no. 270082. The work has been also funded by the Basque Government (project IBILBIDE, SAIOTEK S-PE12UN089). References 1. Otegi, A., Agirre, E., Soroa, A., Aletras, N., Chandrinos, C., Fernando, S., GonzalezAgirre, A.: Report accompanying D2.2: Processing and Representation of Content for Second Prototype. PATHS Project Deliverable (2012), http://www.paths-project.eu/eng/content/download/2489/18113/version/2/ file/D2.2.Content+Processing-2nd+Prototype-revised.v2.pdf 2. Griffiths, J., Goodale, P., Minelli, S., de Polo, A., Agerri, R., Soroa, A., Hall, M., Bergheim, S.R., Chandrinos, K., Chryssochoidis, G., Fernie, K., Usher, T.: D5.1: Evaluation of the first PATHS prototype. PATHS Project Deliverable (2012), http://www.paths-project.eu/eng/Resources/ D5.1-Evaluation-of-the-1st-PATHS-Prototype 3. Chang, A.X., Spitkovsky, V.I., Yeh, E., Agirre, E., Manning, C.D.: Stanford-UBC entity linking at TAC-KBP. In: Proceedings of TAC 2010, Gaithersburg, Maryland, USA (2010) 4. Han, X., Sun, L.: A Generative Entity-Mention Model for Linking Entities with Knowledge Base. In: Proceedings of the ACL, Portland, Oregon, USA (2011) 5. Agirre, E., Aletras, N., Gonzalez-Agirre, A., Rigau, G., Stevenson, M.: UBC UOSTYPED: Regression for typed-similarity. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Atlanta, Georgia, USA (2013) 6. Fernando, S., Hall, M., Agirre, E., Soroa, A., Clough, P., Stevenson, M.: Comparing Taxonomies for Organising Collections of Documents. In: Proceedings of COLING 2012, Mumbai, India (2013)