SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Stefano Bargioni
Pontificia Università della Santa Croce

Catalogue enrichment: importing
Dewey Decimal Classification
from external sources

Oct 18, 2013

ADLUG 2013

1
The project
●

Improving the Dewey search path
–
–

●

●

with a minimal effort
while adding BNCF compliant subject headings to our
catalog

Koha 3 <http://koha-community.org> open source
ILS
Can be applied to other ILS's

Oct 18, 2013

ADLUG 2013

2
Version 1: The Batch Mode
●

Add Dewey notations to the catalog
–

automatically

–

from selected sources

–

ensure quality and uniformity

Oct 18, 2013

ADLUG 2013

3
An atomic copy cataloguing
●
●

copy cataloguing is usually related to the full record
we only need to copy field 082 (MARC21) or 676
(Unimarc)

●

ISBN unique identifier

●

the policy issue

Oct 18, 2013

ADLUG 2013

4
Records to be modified
●

without Dewey notation

●

with ISBN

●

limit: 008 language
–

SELECT biblionumber, ISBN
FROM biblio
WHERE ISBN_present
AND dewey_absent
AND language_008='...'

Oct 18, 2013

ADLUG 2013

In
Ko
cla ha,
My use i the W
Ex
tra SQ s ba HE
on ctV L
s
fie alu fun ed o RE
ld
e, t ctio n
thr bibl ha n
ou io. t w
exp gh X ma ork
res Pa rcxm s
sio th
l
ns
5
Dewey Sources (I)
●

a choice based on copy cataloguing experience

●

OCLC Classify

●

some National Libraries

●

API, Z39.50 or HTML access

Oct 18, 2013

ADLUG 2013

6
Dewey Sources (II): OCLC Classify
●

●

●

Classify is a FRBR-based prototype designed to support the assignment of classification
numbers and subject headings for books, DVDs, CDs, and other types of materials.
This project applies principles of the FRBR model to aggregate bibliographic information
above the manifestation level. Bibliographic records are grouped using the OCLC FRBR
Work-Set algorithm to form a work-level summary of the class numbers and subject headings
assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number,
author/title, or subject heading.
The Classify database is accessible through a user interface and as a machine-to-machine
service. The database provides access to more than 36 million WorldCat records that contain
Dewey Decimal Classification (DDC) numbers,[...].

●

Retrieved information is in XML format.

●

http://www.oclc.org/research/activities/classify.html?urlm=159746

Oct 18, 2013

ADLUG 2013

7
Dewey Sources (III): National Libraries
LC

Library of Congress

(any)

MARC

BNF

Bibliothèque nationale de France

(fre)

MARC

DNB

Deutsche Nationalbibliothek

(ger)

HTML

BNCF

Biblioteca Nazionale Centrale di Firenze

(ita)

HTML

BNCR

Biblioteca Nazionale Centrale di Roma

(ita)

HTML

BNB

British National Bibliography

(eng)

MARC

Oct 18, 2013

ADLUG 2013

8
The logic used in the programs
●

open the connection to the bibliographical database

●

obtain the ISBN from records without a Dewey number

●

open the connection to the Dewey source, if Z39.50

●

for each ISBN

●

query the data source using the current ISBN

●

if a Dewey number is available in the response

●

if the Dewey number passes quality control

●

update the bibliographical record

●

wait to avoid overloading

●

close the connection to the Dewey source, if Z39.50

●

close the connection to the bibliographical database

Oct 18, 2013

ADLUG 2013

9
Quality check
●

Catalogs contain errors

●

DDC has many editions

●

Our old Dewey numbers start from edition 19

●

Indicators

●

Lot of discarded Dewey...

●

… but we moved from 40,000
to 60,000 records with Dewey number

Oct 18, 2013

ADLUG 2013

+5

0%
10
Delay while searching sources
●

Continuous searching can suffocate remote servers
–
–

●
●

robots.txt
policies for crawlers

Continuous indexing can overload your server
Wait a few seconds between searches or group of
searches
–

this will slow the harvesting process

Oct 18, 2013

ADLUG 2013

11
Statistics
Source

Language

Dewey #
not found

Dewey #
discarded

Classify

all

42387

10267

5321

6607

20059

LC

all

31999

1252

21195

8562

1011

BNF

all

30903

2253

21327

7268

55

DNB

ger

4193

163

3867

163

0

BNCF

ita

12017

4088

3643

3542

744

BNCR

ita

7549

1515

3003

2978

53

BNB

eng

6215

193

5449

55

518

Total

Oct 18, 2013

Records
Scanned

Records
Modified

ISBN not
found

Several
works
with
same
ISBN

8240

ISBN
incorrect

133

19710

ADLUG 2013

12
Browsing Dewey Index
Besides author, uniform
titles and subject
headings, our OPAC
offers a path of semantic
search based on the
Dewey classification
number

Oct 18, 2013

ADLUG 2013

13
Software
●

Query programs were written in Perl language, making
use of the Koha API and the following libraries
available on CPAN:
–

LWP for HTTP connections

–

ZOOM for Z39.50 connections

–

DBI for connections to the MySQL database

–

XML::XPath for XML data processing

–

WWW::Scraper for HTML data processing

–

MARC::Record for MARC records processing

Oct 18, 2013

ADLUG 2013

14
A scientific article
●

●

published on JLIS.it at
http://leo.cilea.it/index.php/jlis/article/view/8766
JLIS.it, Italian Journal of Library and information
science, is an academic journal of international
scope, peer-reviewed and open access

●

written with my cataloguers

●

doesn't deal with the dynamic component

Oct 18, 2013

ADLUG 2013

15
Version 2.0 - Single Record Mode
●

New record:
–
–

retrieve Dewey from important catalogs

–
●

enter the ISBN
choose and import the best one into the new record

Or upgrade an old record adding or modifying its
Dewey classification

Oct 18, 2013

ADLUG 2013

16
Oct 18, 2013

ADLUG 2013

17
Conclusions
●

Increase of available bibliographic data on the net

●

Unique identifiers
–
–

●

ISBN, ISSN, ...
VIAF Id, ISNI, ...

Catalog enrichment
–
–

●

bibliographic records
authority records

Expose rich linked data
–

with coded information like Dewey

–

with standard IDs like iSBN, ISNI, ...

Oct 18, 2013

ADLUG 2013

18
Thank you
Gracias
Grazie

Oct 18, 2013

ADLUG 2013

19

Weitere ähnliche Inhalte

Ähnlich wie Catalog enrichment: importing Dewey Decimal Classification from external sources (slides)

OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentscneudecker
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchSawood Alam
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomePiergiorgio Lucidi
 
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in GermanyKirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in GermanyKuali Days UK
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival TechnologiesCliff Landis
 
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...CILIP MDG
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using LokiKnoldus Inc.
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using LokiKnoldus Inc.
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...chiportal
 
Rene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect dataRene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect dataKBNLResearch
 
Cloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsCloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsEd King
 
BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013eimgreece
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Science Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related servicesScience Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related servicesriround
 

Ähnlich wie Catalog enrichment: importing Dewey Decimal Classification from external sources (slides) (20)

Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
 
AGROVOC GACS Working Group
AGROVOC GACS Working GroupAGROVOC GACS Working Group
AGROVOC GACS Working Group
 
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
 
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in GermanyKirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
 
Lokijs
LokijsLokijs
Lokijs
 
Rene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect dataRene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect data
 
Cloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsCloud Foundry Logging and Metrics
Cloud Foundry Logging and Metrics
 
BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Science Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related servicesScience Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related services
 

Mehr von Stefano Bargioni

Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Stefano Bargioni
 
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Stefano Bargioni
 
Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Stefano Bargioni
 
Koha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniKoha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniStefano Bargioni
 
Publication cover management in a library system (text)
Publication cover management in a library system (text)Publication cover management in a library system (text)
Publication cover management in a library system (text)Stefano Bargioni
 
Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Stefano Bargioni
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using SolrStefano Bargioni
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using SolrStefano Bargioni
 

Mehr von Stefano Bargioni (11)

Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
 
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
 
Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)
 
Koha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniKoha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioni
 
Publication cover management in a library system (text)
Publication cover management in a library system (text)Publication cover management in a library system (text)
Publication cover management in a library system (text)
 
Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Publication cover management in a library system (slides)
Publication cover management in a library system (slides)
 
Open, Big, & Linked Data
Open, Big, & Linked DataOpen, Big, & Linked Data
Open, Big, & Linked Data
 
Un nuovo motore per Koha
Un nuovo motore per KohaUn nuovo motore per Koha
Un nuovo motore per Koha
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
Stelline 2013
Stelline 2013Stelline 2013
Stelline 2013
 

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Catalog enrichment: importing Dewey Decimal Classification from external sources (slides)

  • 1. Stefano Bargioni Pontificia Università della Santa Croce Catalogue enrichment: importing Dewey Decimal Classification from external sources Oct 18, 2013 ADLUG 2013 1
  • 2. The project ● Improving the Dewey search path – – ● ● with a minimal effort while adding BNCF compliant subject headings to our catalog Koha 3 <http://koha-community.org> open source ILS Can be applied to other ILS's Oct 18, 2013 ADLUG 2013 2
  • 3. Version 1: The Batch Mode ● Add Dewey notations to the catalog – automatically – from selected sources – ensure quality and uniformity Oct 18, 2013 ADLUG 2013 3
  • 4. An atomic copy cataloguing ● ● copy cataloguing is usually related to the full record we only need to copy field 082 (MARC21) or 676 (Unimarc) ● ISBN unique identifier ● the policy issue Oct 18, 2013 ADLUG 2013 4
  • 5. Records to be modified ● without Dewey notation ● with ISBN ● limit: 008 language – SELECT biblionumber, ISBN FROM biblio WHERE ISBN_present AND dewey_absent AND language_008='...' Oct 18, 2013 ADLUG 2013 In Ko cla ha, My use i the W Ex tra SQ s ba HE on ctV L s fie alu fun ed o RE ld e, t ctio n thr bibl ha n ou io. t w exp gh X ma ork res Pa rcxm s sio th l ns 5
  • 6. Dewey Sources (I) ● a choice based on copy cataloguing experience ● OCLC Classify ● some National Libraries ● API, Z39.50 or HTML access Oct 18, 2013 ADLUG 2013 6
  • 7. Dewey Sources (II): OCLC Classify ● ● ● Classify is a FRBR-based prototype designed to support the assignment of classification numbers and subject headings for books, DVDs, CDs, and other types of materials. This project applies principles of the FRBR model to aggregate bibliographic information above the manifestation level. Bibliographic records are grouped using the OCLC FRBR Work-Set algorithm to form a work-level summary of the class numbers and subject headings assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number, author/title, or subject heading. The Classify database is accessible through a user interface and as a machine-to-machine service. The database provides access to more than 36 million WorldCat records that contain Dewey Decimal Classification (DDC) numbers,[...]. ● Retrieved information is in XML format. ● http://www.oclc.org/research/activities/classify.html?urlm=159746 Oct 18, 2013 ADLUG 2013 7
  • 8. Dewey Sources (III): National Libraries LC Library of Congress (any) MARC BNF Bibliothèque nationale de France (fre) MARC DNB Deutsche Nationalbibliothek (ger) HTML BNCF Biblioteca Nazionale Centrale di Firenze (ita) HTML BNCR Biblioteca Nazionale Centrale di Roma (ita) HTML BNB British National Bibliography (eng) MARC Oct 18, 2013 ADLUG 2013 8
  • 9. The logic used in the programs ● open the connection to the bibliographical database ● obtain the ISBN from records without a Dewey number ● open the connection to the Dewey source, if Z39.50 ● for each ISBN ● query the data source using the current ISBN ● if a Dewey number is available in the response ● if the Dewey number passes quality control ● update the bibliographical record ● wait to avoid overloading ● close the connection to the Dewey source, if Z39.50 ● close the connection to the bibliographical database Oct 18, 2013 ADLUG 2013 9
  • 10. Quality check ● Catalogs contain errors ● DDC has many editions ● Our old Dewey numbers start from edition 19 ● Indicators ● Lot of discarded Dewey... ● … but we moved from 40,000 to 60,000 records with Dewey number Oct 18, 2013 ADLUG 2013 +5 0% 10
  • 11. Delay while searching sources ● Continuous searching can suffocate remote servers – – ● ● robots.txt policies for crawlers Continuous indexing can overload your server Wait a few seconds between searches or group of searches – this will slow the harvesting process Oct 18, 2013 ADLUG 2013 11
  • 12. Statistics Source Language Dewey # not found Dewey # discarded Classify all 42387 10267 5321 6607 20059 LC all 31999 1252 21195 8562 1011 BNF all 30903 2253 21327 7268 55 DNB ger 4193 163 3867 163 0 BNCF ita 12017 4088 3643 3542 744 BNCR ita 7549 1515 3003 2978 53 BNB eng 6215 193 5449 55 518 Total Oct 18, 2013 Records Scanned Records Modified ISBN not found Several works with same ISBN 8240 ISBN incorrect 133 19710 ADLUG 2013 12
  • 13. Browsing Dewey Index Besides author, uniform titles and subject headings, our OPAC offers a path of semantic search based on the Dewey classification number Oct 18, 2013 ADLUG 2013 13
  • 14. Software ● Query programs were written in Perl language, making use of the Koha API and the following libraries available on CPAN: – LWP for HTTP connections – ZOOM for Z39.50 connections – DBI for connections to the MySQL database – XML::XPath for XML data processing – WWW::Scraper for HTML data processing – MARC::Record for MARC records processing Oct 18, 2013 ADLUG 2013 14
  • 15. A scientific article ● ● published on JLIS.it at http://leo.cilea.it/index.php/jlis/article/view/8766 JLIS.it, Italian Journal of Library and information science, is an academic journal of international scope, peer-reviewed and open access ● written with my cataloguers ● doesn't deal with the dynamic component Oct 18, 2013 ADLUG 2013 15
  • 16. Version 2.0 - Single Record Mode ● New record: – – retrieve Dewey from important catalogs – ● enter the ISBN choose and import the best one into the new record Or upgrade an old record adding or modifying its Dewey classification Oct 18, 2013 ADLUG 2013 16
  • 18. Conclusions ● Increase of available bibliographic data on the net ● Unique identifiers – – ● ISBN, ISSN, ... VIAF Id, ISNI, ... Catalog enrichment – – ● bibliographic records authority records Expose rich linked data – with coded information like Dewey – with standard IDs like iSBN, ISNI, ... Oct 18, 2013 ADLUG 2013 18
  • 19. Thank you Gracias Grazie Oct 18, 2013 ADLUG 2013 19