The document summarizes Dr. Timo Borst's presentation on improving library services with semantic web technology, specifically regarding repository systems.
[1] Current library applications generate and manage siloed metadata collections, making aggregation and standardization of metadata difficult for expanded services.
[2] Semantic web services as part of an overarching authority data infrastructure can help standardize metadata early in the process while still allowing for local differences.
[3] If these services are widely adopted and used, there is potential for greater networking of locally managed metadata while improving data-driven services through lightweight integration into existing library applications.
Boost Fertility New Invention Ups Success Rates.pdf
Improving library services with semantic web technology in the realm of repositories
1. Improving Library Services with
Semantic Web Technology
- in the realm of Repository Systems
Dr. Timo Borst
Head of IT Development
German National Library for Economics /
Leibniz-Information Centre Economics
Kiel/Hamburg, Germany
ICDK 2011
14th – 16th February, Gurgaon/India
Die ZBW ist Mitglied der Leibniz-Gemeinschaft
2. Overview
1. Current situation: Distributed (meta-)data management in library
applications
2. Popular approaches towards aggregation and homogeneity of
metadata
3. Our approach: Integration and aggregation of authority values
with Semantic Web technology
a) General idea
b) Use case: Indexing
c) Use case: Retrieving
4. “Lightweight” integration into existing repository systems and
service providers
5. Conclusion
Seite 2
3. Current situation
• The rise of repository systems for academic publishing…
• …has led to a landscape of distributed systems, each of them
holding its own metadata…
• …which is harvested and aggregated by service providers
Seite 3
4. Popular approaches towards aggregation and
homogeneity of metadata
• Normalization in advance (before harvesting) requires
• a mandatory metadata scheme to be applied by the local repositories
• a set of controlled vocabularies (e.g. for publication types)
• an automatic validation of the harvested metadata
• Normalization afterwards (after harvesting) requires
• the definition of a minimum set of metadata fields
• the definition of a basic intermediate metadata scheme for normalizing
the heterogeneous metadata records,
• optionally data cleansing strategies like name disambiguation and
automatic indexing on the basis of thesauri
Both approaches are problematic and reveal ambiguities on the aggregation level !
Seite 4
5. Current situation
• …sounds easy and straight, but implies
severe problems esp. with regard to
ambiguity of
• author names
• subject headings
Seite 5
6. Current situation
„The major difficulty we have found is with DSpace’s handling of
metadata. While we feel that the number of fields in Dublin Core is
adequate for most if not all uses (DCMI Usage Board 2006), we are
troubled by the lack of authority control when completing its fields.
Without some control over uniform titles, authors and subjects
accessing the items in the future will very problematic.“
S. Chabot (http://subjectobject.net/2006/11/09/the-dspace-digital-
repository-a-project-analysis/)
„Neither the standards nor the software unterlying
institutional repositories anticipated performing naming
authority control on widely disparate metadata from
highly unreliable sources.“
D. Salo (http://minds.wisconsin.edu/handle/1793/31735)
Seite 6
7. Our approach: Integration of authority values with
Semantic Web technology
• General idea: “Provide a framework for integrating authority
data, which is both normative and flexible enough to tolerate
local idiosyncrasies on a string level.”
• Approach: Concept modelling based on Semantic Web / SKOS
standards
Seite 7
9. Our approach: Integration of authority values with
Semantic Web technology – Web service
Example queries (for concepts):
http://zbw.eu/beta/stw-ws/suggest?query=finanzkr
…delivers all terms beginning with “finanzkr”
http://zbw.eu/beta/stw-ws/stw-ws-wrapper.php?service=labels&
concept=http://zbw.eu/stw/descriptor/19664-4&lang=en
…delivers all english synonyms of the german “Finanzkrise”
Seite 9
10. Use case: (Self-)Indexing
• One of the most prominent use cases especially for librarians, but also
for scientists and active users not familiar with subject specific
vocabularies
• Main goals:
• Support the process of indexing in order to achieve a classification
of documents which is both coherent and flexible in the sense that
it permits local idiosyncrasies related to authority terms
• Align different vocabularies in the sense that indexing in one
vocabulary is automatically linked to another vocabulary
• Implementation: Extension of the submission interface of our repository by
integrating the terminology web service as an autosuggest function
Seite 10
12. Use case: Retrieving
• To be considered as the most important use case
• Often leading into the classical dilemma of precision and
recall
• Main goal:
• Support the process of retrieving, so users can find the
relevant set of documents
• Implementation: Automatic expansion of the original query with
synonyms, narrower and related terms
Seite 12
19. “Lightweight” integration into existing repository systems
and service providers
Benefits
• „Lightweight“ extension of legacy systems
• Strategy of „least intrusion“: No update or migration needed
• No changes to the core system, only some changes to the data model
may be required:
• Additional column for storing the URI of the authority key
• Export resp. harvesting of the authority as a resource must be able
(->OAI-ORE)
• Other types of library applications suitable for these adaptations:
• catalogues
• portals (e.g. to generate publication lists from an identified author or
thematic issues)
• Any collaborative system with annotation system
Seite 19
20. Zusammenfassung und Fazit
• Bibliotheksanwendungen erzeugen und verwalten jeweils eigene
idiosynkratische Datenbestände.
• Dies erschwert die Pflege, den Austausch, die Aggregation und die
Homogenisierung der (Meta-)Daten für erweiterte Dienste.
• Vorgelagerte Webservices als Teil einer übergreifenden Normdaten-
Infrastruktur können frühzeitig zur Homogenisierung der Metadaten
beitragen (bei gleichzeitiger Lokalisierung).
• Wenn diese Webservices verbreitet entstehen und genutzt werden,
besteht die Chance zu einer weitergehenden Vernetzung lokal
gepflegter Metadaten bei gleichzeitiger Verbesserung der
datenbasierten Services.
• Die Möglichkeit zur „leichtgewichtigen Integration“ ist ein Angebot an
Betreiber von Bibliotheksanwendungen, diese Webservices mit
möglichst minimalem Aufwand in ihre Anwendungen zu integrieren.
Seite 20
21. Vielen Dank!
Dr. Timo Borst
Deutsche Zentralbibliothek für
Wirtschaftswissenschaften /
Leibniz-Informationszentrum
Wirtschaft (ZBW)
t.borst@zbw.eu
Seite 21
22. Anwendungsfall_3: Erfassung von Autoren
•Der Normalfall in Katalogen - in anderen Erfassungssystemen bisher
der Ausnahmefall
•Nutzergruppen: BibliothekarInnen + WissenschaftlerInnen (?) +
BibliotheksnutzerInnen (?)
•Vorgang: Eingabe von AutorInnen-Namen
•Zielstellung: Den Vorgang der Autorenerfassung mit Hilfe von
Normdaten zu verbessern, die durch Webservices bereit gestellt werden
Seite 22
24. Bisherige Lösungsansätze zur Aggregierung &
Homogenisierung
•Metadatensuche durch Aggregatoren
• Parallele Abfrage entfernt-verteilter Systeme
• Rückgabe und Aufbereitung des Suchergebnisses als
zusammengesetzte Trefferliste
•Harvesting
• Regelmäßiges Einsammeln von entfernt-verteilten
Metadaten
• Homogenisierung ex ante oder ex post
•Föderierte Suche
•…
Seite 24