Improving library services with semantic web technology in the realm of repositories

Improving Library Services with
Semantic Web Technology
- in the realm of Repository Systems
Dr. Timo Borst

Head of IT Development
German National Library for Economics /
Leibniz-Information Centre Economics
Kiel/Hamburg, Germany

ICDK 2011
14th – 16th February, Gurgaon/India

Die ZBW ist Mitglied der Leibniz-Gemeinschaft

Overview
1. Current situation: Distributed (meta-)data management in library
applications

2. Popular approaches towards aggregation and homogeneity of
metadata

3. Our approach: Integration and aggregation of authority values
with Semantic Web technology
a) General idea
b) Use case: Indexing
c) Use case: Retrieving

4. “Lightweight” integration into existing repository systems and
service providers

5. Conclusion

Seite 2

Current situation

• The rise of repository systems for academic publishing…

• …has led to a landscape of distributed systems, each of them
holding its own metadata…

• …which is harvested and aggregated by service providers

Seite 3

Popular approaches towards aggregation and
homogeneity of metadata
• Normalization in advance (before harvesting) requires

• a mandatory metadata scheme to be applied by the local repositories
• a set of controlled vocabularies (e.g. for publication types)
• an automatic validation of the harvested metadata

• Normalization afterwards (after harvesting) requires

• the definition of a minimum set of metadata fields
• the definition of a basic intermediate metadata scheme for normalizing
the heterogeneous metadata records,
• optionally data cleansing strategies like name disambiguation and
automatic indexing on the basis of thesauri

Both approaches are problematic and reveal ambiguities on the aggregation level !

Seite 4

Current situation

• …sounds easy and straight, but implies
severe problems esp. with regard to
ambiguity of
• author names
• subject headings

Seite 5

Current situation
„The major difficulty we have found is with DSpace’s handling of
metadata. While we feel that the number of fields in Dublin Core is
adequate for most if not all uses (DCMI Usage Board 2006), we are
troubled by the lack of authority control when completing its fields.
Without some control over uniform titles, authors and subjects
accessing the items in the future will very problematic.“
S. Chabot (http://subjectobject.net/2006/11/09/the-dspace-digital-
repository-a-project-analysis/)
„Neither the standards nor the software unterlying
institutional repositories anticipated performing naming
authority control on widely disparate metadata from
highly unreliable sources.“
D. Salo (http://minds.wisconsin.edu/handle/1793/31735)

Seite 6

Our approach: Integration of authority values with
Semantic Web technology

• General idea: “Provide a framework for integrating authority
data, which is both normative and flexible enough to tolerate
local idiosyncrasies on a string level.”
• Approach: Concept modelling based on Semantic Web / SKOS
standards

Seite 7

Semantic Web technology

Seite 8

Semantic Web technology – Web service
Example queries (for concepts):

http://zbw.eu/beta/stw-ws/suggest?query=finanzkr
…delivers all terms beginning with “finanzkr”

http://zbw.eu/beta/stw-ws/stw-ws-wrapper.php?service=labels&
concept=http://zbw.eu/stw/descriptor/19664-4&lang=en
…delivers all english synonyms of the german “Finanzkrise”

Seite 9

Use case: (Self-)Indexing
• One of the most prominent use cases especially for librarians, but also
for scientists and active users not familiar with subject specific
vocabularies
• Main goals:
• Support the process of indexing in order to achieve a classification
of documents which is both coherent and flexible in the sense that
it permits local idiosyncrasies related to authority terms
• Align different vocabularies in the sense that indexing in one
vocabulary is automatically linked to another vocabulary
• Implementation: Extension of the submission interface of our repository by
integrating the terminology web service as an autosuggest function

Seite 10

Use case: (Self-)Indexing

Submission form https://econstor.eu

Seite 11

Use case: Retrieving
• To be considered as the most important use case

• Often leading into the classical dilemma of precision and
recall
• Main goal:
• Support the process of retrieving, so users can find the
relevant set of documents

• Implementation: Automatic expansion of the original query with
synonyms, narrower and related terms

Seite 12


Expanded search for „financial crisis“ http://econstor.eu

Seite 13



Seite 14



Seite 15

Anwendungsfall_2: Suche

Seite 16

Anwendungsfall_2: Suche

Seite 17

“Lightweight” integration into existing repository systems
and service providers

Seite 18

“Lightweight” integration into existing repository systems
and service providers
Benefits
• „Lightweight“ extension of legacy systems
• Strategy of „least intrusion“: No update or migration needed
• No changes to the core system, only some changes to the data model
may be required:
• Additional column for storing the URI of the authority key
• Export resp. harvesting of the authority as a resource must be able
(->OAI-ORE)

• Other types of library applications suitable for these adaptations:
• catalogues
• portals (e.g. to generate publication lists from an identified author or
thematic issues)
• Any collaborative system with annotation system

Seite 19

Zusammenfassung und Fazit
• Bibliotheksanwendungen erzeugen und verwalten jeweils eigene
idiosynkratische Datenbestände.
• Dies erschwert die Pflege, den Austausch, die Aggregation und die
Homogenisierung der (Meta-)Daten für erweiterte Dienste.
• Vorgelagerte Webservices als Teil einer übergreifenden Normdaten-
Infrastruktur können frühzeitig zur Homogenisierung der Metadaten
beitragen (bei gleichzeitiger Lokalisierung).
• Wenn diese Webservices verbreitet entstehen und genutzt werden,
besteht die Chance zu einer weitergehenden Vernetzung lokal
gepflegter Metadaten bei gleichzeitiger Verbesserung der
datenbasierten Services.
• Die Möglichkeit zur „leichtgewichtigen Integration“ ist ein Angebot an
Betreiber von Bibliotheksanwendungen, diese Webservices mit
möglichst minimalem Aufwand in ihre Anwendungen zu integrieren.
Seite 20

Vielen Dank!

Dr. Timo Borst
Deutsche Zentralbibliothek für
Wirtschaftswissenschaften /
Leibniz-Informationszentrum
Wirtschaft (ZBW)

t.borst@zbw.eu

Seite 21

Anwendungsfall_3: Erfassung von Autoren

•Der Normalfall in Katalogen - in anderen Erfassungssystemen bisher
der Ausnahmefall
•Nutzergruppen: BibliothekarInnen + WissenschaftlerInnen (?) +
BibliotheksnutzerInnen (?)
•Vorgang: Eingabe von AutorInnen-Namen
•Zielstellung: Den Vorgang der Autorenerfassung mit Hilfe von
Normdaten zu verbessern, die durch Webservices bereit gestellt werden

Seite 22

Anwendungsfall_3: Erfassung von Autoren
•Erfassungsmaske unter http://87.106.250.18/beta/econstor/

Seite 23

Bisherige Lösungsansätze zur Aggregierung &
Homogenisierung
•Metadatensuche durch Aggregatoren
• Parallele Abfrage entfernt-verteilter Systeme
• Rückgabe und Aufbereitung des Suchergebnisses als
zusammengesetzte Trefferliste
•Harvesting
• Regelmäßiges Einsammeln von entfernt-verteilten
Metadaten
• Homogenisierung ex ante oder ex post
•Föderierte Suche
•…

Seite 24

•[1] http://wiki.dspace.org/index.php/Authority_Control_of_Metadata_Values
Literatur
•[2] http://minds.wisconsin.edu/handle/1793/31735
•[3] http://dsug09.ub.gu.se/index.php/dsug/dsug09/paper/view/22/3
•[4] http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-project-analysis/
•[5] http://code.google.com/p/dspace-agrisap/wiki/ThesaurusAddOn
•[6] http://edoc.hu-berlin.de/conferences/dc-2008/subirats-imma-199/PDF/subirats.pdf
•[7] http://www.jisc.ac.uk/media/documents/programmes/sharedservices/na
mes-phase-one-final-report,.pdf
•[8] http://idea.library.drexel.edu/bitstream/1860/3173/1/20070051011.pdf
•[9] http://ptsefton.com/blog/2006/06/06/the_affiliation_issue_in
_institutional_repository_software/
•[10] http://library.ust.hk/info/nac/nac-technical.html
•[11] http://www.seco.tkk.fi/publications/2009/kurki-hyvonen-onki-people-2009.pdf
•[12] http://journals.sfu.ca/archivar/index.php/archivaria/article/download/11883/12836
•[13] http://www.dini.de/fileadmin/workshops/oa-netzwerk-
juni2009/vernetzungstage_2009_malitz.pdf

Seite 25

Improving library services with semantic web technology in the realm of repositories

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Improving library services with semantic web technology in the realm of repositories

Ähnlich wie Improving library services with semantic web technology in the realm of repositories (20)

Mehr von redsys

Mehr von redsys (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Improving library services with semantic web technology in the realm of repositories