SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Automatic creation of mappings between
classification systems
Prof. Magnus Pfeffer
Stuttgart Media University
pfeffer@hdm-stuttgart.de
July 10th, 2013 European Conference on Data Analysis 2013 2
 Motivation
 Instance-based matching
 Application to bibliographic data
 Evaluation
 Ongoing projects
Agenda
July 10th, 2013 European Conference on Data Analysis 2013 3
Motivation
July 10th, 2013 European Conference on Data Analysis 2013 4
Current situation in Germany
 Five regional library unions
 Subject headings
 Predominantly RSWK („Regeln für den Schlagwortkatalog“ -
„Rules for the subject catalogue“) using a shared authority file
 Classification systems
 RVK (Regensburg Union Classification)
 BK (Basic Classification)
 DDC (Dewey Decimal Classification)
 Various local classification systems
 Low proportion of indexed titles (25-30%)
July 10th, 2013 European Conference on Data Analysis 2013 5
Current situation in Germany
 National library
 Subject headings
 Predominantly RSWK („Regeln für den Schlagwortkatalog“ -
„Rules for the subject catalogue“) using a shared authority file
 Classification systems
 DDC (Dewey Decimal Classification)
 Coarse categories
 DDC only for titles published since 2007
 Only „Reihe A“ (print trade publications) is fully indexed
with RSWK
July 10th, 2013 European Conference on Data Analysis 2013 6
Goals
 Re-use existing indexing information
 National level
 BK is used mainly in northern Germany
 RVK mainly in southern Germany
 DDC mainly by the National Library
 International level
 Make RVK data more accessible to DDC users
 Use DDC indexing information available from e.g. the Library
of Congress
July 10th, 2013 European Conference on Data Analysis 2013 7
Ideas
 Use of appropriate classification systems
 Facetted search in resource discovery systems
 Should be monohierarchical
 Should have limited number of classes
→ DDC (first digits) or BK
 Browsing of similar titles
 Should be fine-grained
→ DDC (full) or RVK
 Multi-lingual retrieval
July 10th, 2013 European Conference on Data Analysis 2013 8
Ideas
 Enable the use of existing tools and visualisations
Denton (2012) Legrady (2005)
July 10th, 2013 European Conference on Data Analysis 2013 9
Instance-based Matching
July 10th, 2013 European Conference on Data Analysis 2013 10
Ontology matching
 Well-studied problem in computer science
 Several approaches
 Based on the descriptors
 Based on the structure
 Based on the manifestations (instances)
July 10th, 2013 European Conference on Data Analysis 2013 11
Instances
 Entries in catalogues with multiple classifications
July 10th, 2013 European Conference on Data Analysis 2013 12
Instance-based matching
 Assumptions
 Classes with semantic overlap appear together
 The more often these classes co-occur, the stronger
the overlap
 Preparation
 Extraction of all pairs of classifications from the data
 Count of the extracted pairs
July 10th, 2013 European Conference on Data Analysis 2013 13
Example
July 10th, 2013 European Conference on Data Analysis 2013 14
Example
 Entry 1
 DDC: 179.9
 RVK: CC 7200
 RVK: CC 7250
 Entry 2
 DDC: 179.9
 RVK: CC 7200
 Pairs
 179.9 / CC 7200
 179.9 / CC 7250
 179.9 / CC 7200
July 10th, 2013 European Conference on Data Analysis 2013 15
Normalisation
 Comparing solely absolute numbers is bad
 Some classes are more often used than others
 Number of pairs correlates with the number of entries
that are classified using a given class
 Instead:
Use proportion of co-occurrence ↔ single occurrence
∣Ec1∩Ec2∣
∣Ec1∪Ec2∣
number of entries with both classifications
divided by
number of entries with either classification
(Jaccard measure for overlap of sets)
July 10th, 2013 European Conference on Data Analysis 2013 16
Further interpretation
 a and b are two classes from two classification
systems A and B
 The classes a and b only occur together
→ exact match
 a only co-occurs with b, but b co-occurs with other
classes from A
→ a is narrower concept than b
 a co-occurs with several classes from B (including b)
→ a is wider concept than b
 a and b do not co-occur
→ cannot infer that a and b are unrelated
July 10th, 2013 European Conference on Data Analysis 2013 17
Prior work
 Pfeffer (2009)
 Analysis of classification system structure and actual
use
 Locating classes that describe the same concept
 Finding ways to improve existing mappings to RVK
 Focus on RVK, using data from library union catalogues
 Co-occurrence analysis
 Results
 High co-occurrence and close in the hierarchy:
→ classes are hard to assign properly
 High co-occurrence and far in the hierarchy:
→ classes describe identical concepts
 Mappings from RSWK to RVK could be augmented
July 10th, 2013 European Conference on Data Analysis 2013 18
Related work
 Isaac et.al. (2007)
 Applied instance based matching to bibliographic data
 Data from the National Library of the Netherlands
 Mapping from a thesaurus to a classification system
 Results
 Generated mappings are quite good
 More sophisticated measures than Jaccard do not lead to
better mappings
July 10th, 2013 European Conference on Data Analysis 2013 19
Application to bibliographic data
July 10th, 2013 European Conference on Data Analysis 2013 20
Bibliographic data is different
 Multiple editions
 Multiple document types
July 10th, 2013 European Conference on Data Analysis 2013 21
Bibliographic data
 Skewed data
 Multiple editions → More pairs
 Some co-occurrences could appear stronger than others
 Solution: Pre-clustering individual titles on the „work“ level
 Increases chance for instances with more than one
classifications
 Each cluster contributes only once
 Allows using absolute co-occurrence numbers
 Cut-off for small numbers
 Ranking of competing matches
July 10th, 2013 European Conference on Data Analysis 2013 22
Prior work
 Pfeffer (2013)
 Matching bibliographic records
 Based on author, title and uniform title

(as well as information on title changes)
 Matches any edition and revision of a work

Including translations
 Full closure over the match sets → Discrete clusters
 Consolidating indexing information
 For indexing purposes, the differences between editions and
revisions are irrelevant
 Subject headings and classifications are shared between all
members of a cluster
July 10th, 2013 European Conference on Data Analysis 2013 23
Evaluation
July 10th, 2013 European Conference on Data Analysis 2013 24
Comparison with existing mappings
 Existing (partial) mappings can be used as a basis for
evaluation
→ „Gold standard“
 Comparison of automatic and manual mapping
 Recall: Are all the mappings found?
 Precision: Are all found mappings correct?
 Analysis of additional links
 Maybe the gold standard can be improved?
July 10th, 2013 European Conference on Data Analysis 2013 25
Ongoing projects
July 10th, 2013 European Conference on Data Analysis 2013 26
Data
 Bibliographic data
 All German library union catalogues
 German National Library catalogue
 Austrian National Library catalogue
 British national bibliography
 Gold standards
 Partial mappings BK ↔ RVK
July 10th, 2013 European Conference on Data Analysis 2013 27
Interesting Mappings
 RVK → BK
 Gold standard exists
 BK well suited for faceted retrieval
 RVK has largest proportion of classified titles
 RVK ↔ DDC
 Enable data sharing between the German National
Library and the RVK-using libraries
 Not limited to classification systems
 See Pfeffer (2009) and Wang et.al. (2009)
July 10th, 2013 European Conference on Data Analysis 2013 28
Software features
 Using the „Metafactory“ framework
 Import of MAB2 and MARC data
 Clustering
 Generation of keys for the match process
 Matching and clustering (full closure)
 Consolidation of indexing and classification information
 Statistics
 Single occurrence and co-occurrence counts
 Jaccard measure
 Output
 Full mappings
July 10th, 2013 European Conference on Data Analysis 2013 29
Thank you for listening.
Slides available online
http://www.slideshare.net/MagnusPfeffer/
This work is licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
July 10th, 2013 European Conference on Data Analysis 2013 30
References
 Denton, W. (2012). On dentographs, a new method of visualizing library collections. In:
Code4Lib.
 Isaac, A., Van Der Meij, L., Schlobach, S. and Wang, S. (2007).
An empirical study of instance-based ontology matching. In: The Semantic Web (pp.
253-266). Springer Berlin Heidelberg.
 Legrady, G. (2005). Making visible the invisible. Seattle Library Data Flow
Visualization. In: Digital Culture and Heritage. Proceedings of ICHIM05 Sept, 21-23.
 Pfeffer, M. (2009). Äquivalenzklassen – Alle Doppelstellen der RVK finden.
Presentation given at the Librarian Workshop of the 33rd Annual Conference of the
German Classification Society on Data Analysis, Machine Learning, and Applications
(GfKl).
 Pfeffer, M. (2013). Using clustering across union catalogues to enrich entries with
indexing information. In: Data Analysis, Machine Learning and Knowledge Discovery -
Proceedings of the 36th Annual Conference of the Gesellschaft für Klassifikation e. V..
Springer Berlin Heidelberg.
 Wang, S., Isaac, A., Schopman, B., Schlobach, S. and Van Der Meij, L. (2009).
Matching multi-lingual subject vocabularies. In: Research and Advanced Technology
for Digital Libraries (pp. 125-137). Springer Berlin Heidelberg.

Weitere ähnliche Inhalte

Was ist angesagt?

Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...తేజ దండిభట్ల
 
Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...Ellen Van Keer
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3Csssw2012
 
Data 2 Documents: Modular and Distributive Content Management in RDF
Data 2 Documents: Modular and Distributive Content Management in RDFData 2 Documents: Modular and Distributive Content Management in RDF
Data 2 Documents: Modular and Distributive Content Management in RDFNiels Ockeloen
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-MExecuting SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-MLinked Enterprise Date Services
 
Linked data as a library data platform
Linked data as a library data platformLinked data as a library data platform
Linked data as a library data platformJindřich Mynarz
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgJindřich Mynarz
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataFelix Ostrowski
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked DataEUCLID project
 
Introduction to persistency and Berkeley DB
Introduction to persistency and Berkeley DBIntroduction to persistency and Berkeley DB
Introduction to persistency and Berkeley DBPhilip Johnson
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSHarsh Thakkar
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXStuart Chalk
 
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...Olaf Hartig
 

Was ist angesagt? (20)

Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
 
Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...
 
Linked data tooling XML
Linked data tooling XMLLinked data tooling XML
Linked data tooling XML
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3C
 
Linking library data
Linking library dataLinking library data
Linking library data
 
Data 2 Documents: Modular and Distributive Content Management in RDF
Data 2 Documents: Modular and Distributive Content Management in RDFData 2 Documents: Modular and Distributive Content Management in RDF
Data 2 Documents: Modular and Distributive Content Management in RDF
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-MExecuting SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M
 
Linked data as a library data platform
Linked data as a library data platformLinked data as a library data platform
Linked data as a library data platform
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic Data
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Introduction to persistency and Berkeley DB
Introduction to persistency and Berkeley DBIntroduction to persistency and Berkeley DB
Introduction to persistency and Berkeley DB
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
 
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
 

Andere mochten auch

Linked Data in der Lehre
Linked Data in der LehreLinked Data in der Lehre
Linked Data in der LehreMagnus Pfeffer
 
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen übe...
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen  übe...Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen  übe...
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen übe...Magnus Pfeffer
 
Bibliotheken und Linked Open Data
Bibliotheken und Linked Open DataBibliotheken und Linked Open Data
Bibliotheken und Linked Open DataMagnus Pfeffer
 
Automatisches Generieren von Konkordanzen
Automatisches Generieren von KonkordanzenAutomatisches Generieren von Konkordanzen
Automatisches Generieren von KonkordanzenMagnus Pfeffer
 
Bibliotheken und Linked Open Data Extended
Bibliotheken und Linked Open Data ExtendedBibliotheken und Linked Open Data Extended
Bibliotheken und Linked Open Data ExtendedMagnus Pfeffer
 
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...Magnus Pfeffer
 
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...Magnus Pfeffer
 
Clustering auf Werksebene
Clustering auf WerksebeneClustering auf Werksebene
Clustering auf WerksebeneMagnus Pfeffer
 
Cloud Computing für die Verarbeitung von Metadaten
Cloud Computing für die Verarbeitung von MetadatenCloud Computing für die Verarbeitung von Metadaten
Cloud Computing für die Verarbeitung von MetadatenMagnus Pfeffer
 
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMetadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMagnus Pfeffer
 
Jetzt kommt zusammen, was zusammen gehört
Jetzt kommt zusammen, was zusammen gehörtJetzt kommt zusammen, was zusammen gehört
Jetzt kommt zusammen, was zusammen gehörtMagnus Pfeffer
 
Bibliotheken und Linked Open Data
Bibliotheken und Linked Open DataBibliotheken und Linked Open Data
Bibliotheken und Linked Open DataMagnus Pfeffer
 
Resource Discovery: Herausforderung und Chance für die Sacherschließung
Resource Discovery:  Herausforderung und Chance für die SacherschließungResource Discovery:  Herausforderung und Chance für die Sacherschließung
Resource Discovery: Herausforderung und Chance für die SacherschließungMagnus Pfeffer
 
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzenAusleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzenMagnus Pfeffer
 
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...Magnus Pfeffer
 
Resource Discovery - Sacherschließung am Ende?
Resource Discovery - Sacherschließung am Ende?Resource Discovery - Sacherschließung am Ende?
Resource Discovery - Sacherschließung am Ende?Magnus Pfeffer
 
Open Source Software zur Verarbeitung und Analyse von Metadatenmanagement
Open Source Software zur Verarbeitung und Analyse von MetadatenmanagementOpen Source Software zur Verarbeitung und Analyse von Metadatenmanagement
Open Source Software zur Verarbeitung und Analyse von MetadatenmanagementMagnus Pfeffer
 

Andere mochten auch (17)

Linked Data in der Lehre
Linked Data in der LehreLinked Data in der Lehre
Linked Data in der Lehre
 
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen übe...
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen  übe...Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen  übe...
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen übe...
 
Bibliotheken und Linked Open Data
Bibliotheken und Linked Open DataBibliotheken und Linked Open Data
Bibliotheken und Linked Open Data
 
Automatisches Generieren von Konkordanzen
Automatisches Generieren von KonkordanzenAutomatisches Generieren von Konkordanzen
Automatisches Generieren von Konkordanzen
 
Bibliotheken und Linked Open Data Extended
Bibliotheken und Linked Open Data ExtendedBibliotheken und Linked Open Data Extended
Bibliotheken und Linked Open Data Extended
 
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
 
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
 
Clustering auf Werksebene
Clustering auf WerksebeneClustering auf Werksebene
Clustering auf Werksebene
 
Cloud Computing für die Verarbeitung von Metadaten
Cloud Computing für die Verarbeitung von MetadatenCloud Computing für die Verarbeitung von Metadaten
Cloud Computing für die Verarbeitung von Metadaten
 
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMetadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
 
Jetzt kommt zusammen, was zusammen gehört
Jetzt kommt zusammen, was zusammen gehörtJetzt kommt zusammen, was zusammen gehört
Jetzt kommt zusammen, was zusammen gehört
 
Bibliotheken und Linked Open Data
Bibliotheken und Linked Open DataBibliotheken und Linked Open Data
Bibliotheken und Linked Open Data
 
Resource Discovery: Herausforderung und Chance für die Sacherschließung
Resource Discovery:  Herausforderung und Chance für die SacherschließungResource Discovery:  Herausforderung und Chance für die Sacherschließung
Resource Discovery: Herausforderung und Chance für die Sacherschließung
 
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzenAusleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
 
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
 
Resource Discovery - Sacherschließung am Ende?
Resource Discovery - Sacherschließung am Ende?Resource Discovery - Sacherschließung am Ende?
Resource Discovery - Sacherschließung am Ende?
 
Open Source Software zur Verarbeitung und Analyse von Metadatenmanagement
Open Source Software zur Verarbeitung und Analyse von MetadatenmanagementOpen Source Software zur Verarbeitung und Analyse von Metadatenmanagement
Open Source Software zur Verarbeitung und Analyse von Metadatenmanagement
 

Ähnlich wie Automatic creation of mappings between classification systems

Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1ErhardRahm
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...Holistic Benchmarking of Big Linked Data
 
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect matchLinked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect matchChristoph Lange
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
RSpace overview Harvard DataFest 2020
RSpace overview Harvard DataFest 2020RSpace overview Harvard DataFest 2020
RSpace overview Harvard DataFest 2020rmacneil88
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Max Neunhöffer
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...tmra
 
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Ralf Stockmann
 
Progress in semantic mapping - NKOS
Progress in semantic mapping - NKOSProgress in semantic mapping - NKOS
Progress in semantic mapping - NKOSAntoine Isaac
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
 
State and future of linked data in learning analytics
State and future of linked data in learning analyticsState and future of linked data in learning analytics
State and future of linked data in learning analyticsMathieu d'Aquin
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenHeinz Pampel
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddataStefan Gradmann
 

Ähnlich wie Automatic creation of mappings between classification systems (20)

RDF data clustering
RDF data clusteringRDF data clustering
RDF data clustering
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
 
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect matchLinked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
RSpace overview Harvard DataFest 2020
RSpace overview Harvard DataFest 2020RSpace overview Harvard DataFest 2020
RSpace overview Harvard DataFest 2020
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...
 
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
 
Progress in semantic mapping - NKOS
Progress in semantic mapping - NKOSProgress in semantic mapping - NKOS
Progress in semantic mapping - NKOS
 
ACRL Trust in Science Talk
ACRL Trust in Science TalkACRL Trust in Science Talk
ACRL Trust in Science Talk
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
State and future of linked data in learning analytics
State and future of linked data in learning analyticsState and future of linked data in learning analytics
State and future of linked data in learning analytics
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von Forschungsdaten
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddata
 

Kürzlich hochgeladen

RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 

Kürzlich hochgeladen (20)

RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pillsMifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 

Automatic creation of mappings between classification systems

  • 1. Automatic creation of mappings between classification systems Prof. Magnus Pfeffer Stuttgart Media University pfeffer@hdm-stuttgart.de
  • 2. July 10th, 2013 European Conference on Data Analysis 2013 2  Motivation  Instance-based matching  Application to bibliographic data  Evaluation  Ongoing projects Agenda
  • 3. July 10th, 2013 European Conference on Data Analysis 2013 3 Motivation
  • 4. July 10th, 2013 European Conference on Data Analysis 2013 4 Current situation in Germany  Five regional library unions  Subject headings  Predominantly RSWK („Regeln für den Schlagwortkatalog“ - „Rules for the subject catalogue“) using a shared authority file  Classification systems  RVK (Regensburg Union Classification)  BK (Basic Classification)  DDC (Dewey Decimal Classification)  Various local classification systems  Low proportion of indexed titles (25-30%)
  • 5. July 10th, 2013 European Conference on Data Analysis 2013 5 Current situation in Germany  National library  Subject headings  Predominantly RSWK („Regeln für den Schlagwortkatalog“ - „Rules for the subject catalogue“) using a shared authority file  Classification systems  DDC (Dewey Decimal Classification)  Coarse categories  DDC only for titles published since 2007  Only „Reihe A“ (print trade publications) is fully indexed with RSWK
  • 6. July 10th, 2013 European Conference on Data Analysis 2013 6 Goals  Re-use existing indexing information  National level  BK is used mainly in northern Germany  RVK mainly in southern Germany  DDC mainly by the National Library  International level  Make RVK data more accessible to DDC users  Use DDC indexing information available from e.g. the Library of Congress
  • 7. July 10th, 2013 European Conference on Data Analysis 2013 7 Ideas  Use of appropriate classification systems  Facetted search in resource discovery systems  Should be monohierarchical  Should have limited number of classes → DDC (first digits) or BK  Browsing of similar titles  Should be fine-grained → DDC (full) or RVK  Multi-lingual retrieval
  • 8. July 10th, 2013 European Conference on Data Analysis 2013 8 Ideas  Enable the use of existing tools and visualisations Denton (2012) Legrady (2005)
  • 9. July 10th, 2013 European Conference on Data Analysis 2013 9 Instance-based Matching
  • 10. July 10th, 2013 European Conference on Data Analysis 2013 10 Ontology matching  Well-studied problem in computer science  Several approaches  Based on the descriptors  Based on the structure  Based on the manifestations (instances)
  • 11. July 10th, 2013 European Conference on Data Analysis 2013 11 Instances  Entries in catalogues with multiple classifications
  • 12. July 10th, 2013 European Conference on Data Analysis 2013 12 Instance-based matching  Assumptions  Classes with semantic overlap appear together  The more often these classes co-occur, the stronger the overlap  Preparation  Extraction of all pairs of classifications from the data  Count of the extracted pairs
  • 13. July 10th, 2013 European Conference on Data Analysis 2013 13 Example
  • 14. July 10th, 2013 European Conference on Data Analysis 2013 14 Example  Entry 1  DDC: 179.9  RVK: CC 7200  RVK: CC 7250  Entry 2  DDC: 179.9  RVK: CC 7200  Pairs  179.9 / CC 7200  179.9 / CC 7250  179.9 / CC 7200
  • 15. July 10th, 2013 European Conference on Data Analysis 2013 15 Normalisation  Comparing solely absolute numbers is bad  Some classes are more often used than others  Number of pairs correlates with the number of entries that are classified using a given class  Instead: Use proportion of co-occurrence ↔ single occurrence ∣Ec1∩Ec2∣ ∣Ec1∪Ec2∣ number of entries with both classifications divided by number of entries with either classification (Jaccard measure for overlap of sets)
  • 16. July 10th, 2013 European Conference on Data Analysis 2013 16 Further interpretation  a and b are two classes from two classification systems A and B  The classes a and b only occur together → exact match  a only co-occurs with b, but b co-occurs with other classes from A → a is narrower concept than b  a co-occurs with several classes from B (including b) → a is wider concept than b  a and b do not co-occur → cannot infer that a and b are unrelated
  • 17. July 10th, 2013 European Conference on Data Analysis 2013 17 Prior work  Pfeffer (2009)  Analysis of classification system structure and actual use  Locating classes that describe the same concept  Finding ways to improve existing mappings to RVK  Focus on RVK, using data from library union catalogues  Co-occurrence analysis  Results  High co-occurrence and close in the hierarchy: → classes are hard to assign properly  High co-occurrence and far in the hierarchy: → classes describe identical concepts  Mappings from RSWK to RVK could be augmented
  • 18. July 10th, 2013 European Conference on Data Analysis 2013 18 Related work  Isaac et.al. (2007)  Applied instance based matching to bibliographic data  Data from the National Library of the Netherlands  Mapping from a thesaurus to a classification system  Results  Generated mappings are quite good  More sophisticated measures than Jaccard do not lead to better mappings
  • 19. July 10th, 2013 European Conference on Data Analysis 2013 19 Application to bibliographic data
  • 20. July 10th, 2013 European Conference on Data Analysis 2013 20 Bibliographic data is different  Multiple editions  Multiple document types
  • 21. July 10th, 2013 European Conference on Data Analysis 2013 21 Bibliographic data  Skewed data  Multiple editions → More pairs  Some co-occurrences could appear stronger than others  Solution: Pre-clustering individual titles on the „work“ level  Increases chance for instances with more than one classifications  Each cluster contributes only once  Allows using absolute co-occurrence numbers  Cut-off for small numbers  Ranking of competing matches
  • 22. July 10th, 2013 European Conference on Data Analysis 2013 22 Prior work  Pfeffer (2013)  Matching bibliographic records  Based on author, title and uniform title  (as well as information on title changes)  Matches any edition and revision of a work  Including translations  Full closure over the match sets → Discrete clusters  Consolidating indexing information  For indexing purposes, the differences between editions and revisions are irrelevant  Subject headings and classifications are shared between all members of a cluster
  • 23. July 10th, 2013 European Conference on Data Analysis 2013 23 Evaluation
  • 24. July 10th, 2013 European Conference on Data Analysis 2013 24 Comparison with existing mappings  Existing (partial) mappings can be used as a basis for evaluation → „Gold standard“  Comparison of automatic and manual mapping  Recall: Are all the mappings found?  Precision: Are all found mappings correct?  Analysis of additional links  Maybe the gold standard can be improved?
  • 25. July 10th, 2013 European Conference on Data Analysis 2013 25 Ongoing projects
  • 26. July 10th, 2013 European Conference on Data Analysis 2013 26 Data  Bibliographic data  All German library union catalogues  German National Library catalogue  Austrian National Library catalogue  British national bibliography  Gold standards  Partial mappings BK ↔ RVK
  • 27. July 10th, 2013 European Conference on Data Analysis 2013 27 Interesting Mappings  RVK → BK  Gold standard exists  BK well suited for faceted retrieval  RVK has largest proportion of classified titles  RVK ↔ DDC  Enable data sharing between the German National Library and the RVK-using libraries  Not limited to classification systems  See Pfeffer (2009) and Wang et.al. (2009)
  • 28. July 10th, 2013 European Conference on Data Analysis 2013 28 Software features  Using the „Metafactory“ framework  Import of MAB2 and MARC data  Clustering  Generation of keys for the match process  Matching and clustering (full closure)  Consolidation of indexing and classification information  Statistics  Single occurrence and co-occurrence counts  Jaccard measure  Output  Full mappings
  • 29. July 10th, 2013 European Conference on Data Analysis 2013 29 Thank you for listening. Slides available online http://www.slideshare.net/MagnusPfeffer/ This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
  • 30. July 10th, 2013 European Conference on Data Analysis 2013 30 References  Denton, W. (2012). On dentographs, a new method of visualizing library collections. In: Code4Lib.  Isaac, A., Van Der Meij, L., Schlobach, S. and Wang, S. (2007). An empirical study of instance-based ontology matching. In: The Semantic Web (pp. 253-266). Springer Berlin Heidelberg.  Legrady, G. (2005). Making visible the invisible. Seattle Library Data Flow Visualization. In: Digital Culture and Heritage. Proceedings of ICHIM05 Sept, 21-23.  Pfeffer, M. (2009). Äquivalenzklassen – Alle Doppelstellen der RVK finden. Presentation given at the Librarian Workshop of the 33rd Annual Conference of the German Classification Society on Data Analysis, Machine Learning, and Applications (GfKl).  Pfeffer, M. (2013). Using clustering across union catalogues to enrich entries with indexing information. In: Data Analysis, Machine Learning and Knowledge Discovery - Proceedings of the 36th Annual Conference of the Gesellschaft für Klassifikation e. V.. Springer Berlin Heidelberg.  Wang, S., Isaac, A., Schopman, B., Schlobach, S. and Van Der Meij, L. (2009). Matching multi-lingual subject vocabularies. In: Research and Advanced Technology for Digital Libraries (pp. 125-137). Springer Berlin Heidelberg.