SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
I name thee Bay of Pe(a)rls : some practical virtues of Perl for cataloguers
                                                   Jenny Quilliam


Abstract

With the increasing numbers of aggregated electronic resources, libraries now tend to ‘collect in batches’.
These aggregated collections may not be permanent and are subject to frequent and significant frequent content
changes. One survival strategy for Cataloguers is to ‘catalogue in batches’. While some publishers and vendors
are now supplying files of MARC records for their aggregated resources, these often need to be adapted by
libraries to include local authentication and access restriction information.

Perl (Practical Extraction and Reporting Language) – is an easy to learn programming language which was
designed to work with chunks of text – extracting, pattern matching / replacing, and reporting. MARC records
are just long strings of highly formatted text and scripting with Perl is a practical way to edit fields, to add local
information, change subfields, delete unwanted fields etc. – any find-and-replace or insert operation for which
the algorithm can be defined.

As cataloguers are already familiar with MARC coding and can define the algorithms, learning a bit of Perl
means that cataloguers can easily add a few strings of Perls to their repertoire of skills


Introduction

In reviewing the literature on current and future roles for cataloguers, two major themes emerge: cataloguers
need to be outcomes focussed and that new competencies are required to address the challenges in providing
bibliographic access control for remote-access online resources.

Electronic resources – primarily fulltext electronic journals and fulltext aggregated databases – have
significantly improved libraries’ ability to deliver content to users regardless of time and distance. Integrated
access means that the library catalogue must reflect all the resources that can be accessed especially those that
are just a few clicks away. Macro cataloguing approaches are needed to deal with the proliferation of electronic
resources and the high maintenance load caused by both long-term and temporary associated content volatility
of these resources.

In the United States, the Federal Library and Information Center Committee’s Personnel Working Group
(2001) is developing Knowledge, Skills and Abilities statements for its various professional groups. For
Catalogers, it has identified abilities including:

•   Ability to apply cataloging rules and adapt to changing rules and guidelines
•   Ability to accept and deal with ambiguity and make justifiable cataloguing decisions in the absence of
    clear-cut guidelines
•   Ability to create effective cataloging records where little or not precedent cataloguing exists

Anderson (2000) argues that without decrying the importance of individual title cataloguing, macro-
cataloguing approaches to manage large sets of records are essential. Responsibility for managing quality
control, editing, loading, maintaining and unloading requires the “Geek Factor”. In a column which outlined
skills required for librarians to manage digital collections, Tennant (1999) observed that while digital librarians
do not need to be programmers, it is useful to know one’s way around a programming language and while the
specific languages will vary a “general purpose language such as Perl can serve as a digital librarian’s Swiss
Army knife – something that can perform a variety of tasks quickly and easily”.
What is Perl and why is it useful?
Perl is the acronym for Practical Extraction and Report Language. It is a high-level interpreted language
optimized for scanning arbitrary text files and extracting, manipulating and reporting information from those
text files. Unpacking this statement:
• high-level = humans can read it
• interpreted = doesn’t need to be compiled and is thus easier to debug and correct
• text capabilities = Perl handles text in much the same way as people do

Perl is a low cost – free - scripting language with very generous licensing provisions. To write a Perl script all
you need is any text editor – e.g. Notepad or Arachnaphilia - as Perl scripts are just plain text files

Perl is an outcomes focussed programming language – the ‘P” in Perl means practical and it is designed to get
things done. This means that it is complete, easy to use and efficient. Perl uses sophisticated pattern-matching
techniques to scan large amounts of data very quickly and it can do tasks that in other programming languages
would be more complex, take longer to write, debug and test. There are often many ways to accomplish a task
in Perl.

Perl is optimized for text processing – and this is precisely what is required in creating editing and otherwise
manipulating MARC records. A word of caution - while Perl is more forgiving that many other programming
languages, there is a structure and syntax to be observed – in many ways familiar territory to cataloguers who
deal with AACR2R and MARC rules, coding and syntax.


Resources for learning Perl

There are many how-to books on Perl. If you have no previous programming knowledge, two introductory texts
are Paul Hoffman’s Perl 5 for dummies or Schwartz & Christiansen’s Learning Perl. Both are written in a
gentle tutorial style, with comprehensive indexes and detailed tables of contents. Another useful resource is the
Perl Cookbook, which contains around 1000 how-to-recipes for Perl – giving firstly the quick answer followed
by a detailed discussion of the answer to the problem

For online resources, an Internet search on the phrase ‘Perl tutorial’ yields pages of results. Two examples of
beginner level tutorials are Take 10mins to learn Perl and Nik Silver’s Perl tutorial.



How much Perl is needed to manipulate MARC records?

The good news is “not a lot” – there are a number of tools available to deal with the more challenging
intricacies of the MARC format – the directory structure and offsets, field and subfield terminators etc. These
MARC editing tools (discussed below) allow you to deal with MARC records in a tagged text format rather
than the single string. Not only is a tagged text format much easier to read (for humans) but it can be easily
updated and manipulated using simple Perl scripts.

Certainly to create a useful Perl script need to learn how to open files for reading and writing, something about
control structures, conditionals and pattern matching and substitution.



MARC record tools

There are a range of MARC editing tools available for use and the Library of Congress maintains a listing of
MARC Specialized Tools at: http://lcweb.loc.gov/marc/marctools.html

MARCBreaker is a Library of Congress utility for converting MARC records into an ASCII text file format. It
has a complimentary utility, MARCMaker, which can then be used to reformat from this file format into
MARC records. The current version only runs under DOS and Windows 95/98. There is also a companion
MarcEdit utility to MARCBreaker/MARCMaker developed by Terry Reese (2001). MarcEdit is currently in
version 3.0 and has a number of useful editing features including global field addition and deletion.
Simon Huggard and David Groenewegen in their paper ‘E-data management: data access and cataloguing
strategies to support the Monash University virtual library’ outline the use of MARCBreaker and MARCMaker
to edit record sets for various database aggregates. The Virtual University of Virginia (VIVA) has also used
MARCMaker together with the MARC.pm module to convert and manipulate MARC records for electronic
texts.

MARC.pm is Perl 5 module for preprocessing, converting and manipulating MARC records. SourceForge
maintains an informative website for MARC.pm that includes documentation with a few examples. It is a
comprehensive module that can convert from MARC to ASCII, HTML, and XML and includes a number of
‘methods’ with options to create, delete and update MARC fields. Using MARC.pm requires a reasonable
knowledge of Perl and general programming constructs. MARC.pm is used by the JAKE project to create
MARC records. Michael Doran, University of Texas at Arlington, uses MARC.pm together with Perl scripts to
preprocess MARC records for netLibrary. A description of this project can be found at:
http://rocky.uta.edu/doran/preprocess/process.html

marc.pl is a Perl module written by Steve Thomas from Adelaide University. It is a utility for extracting record
from a file of MARC records, and converting records between standard MARC format and a tagged text
representation and vice-versa from tagged text to MARC. One of the best features of this utility is the ability to
add tags globally to each record by the use of a globals tagged text file. The marc.pl utility with documentation
is available for download at: www.library.adelaide.edu.au/~sthomas/scripts/

It uses command line switches to specify the output format and options to include a global file or skip records.
By default, marc.pl creates serial format MARC records, Leader ‘as’ so it is particularly suited to creating
records for electronic journals in aggregated databases and publisher collections. The tagged text format
required by marc.pl is simple – each field is on a separate line, the tag and indicator information is separated by
a space and subfields are terminated with a single dagger delimiter. Records are separated by a blank line.

To use marc.pl it is helpful to know what Perl is and this is why I first dived [paddled is probably a more
accurate verb] into the world of Perl. Once in though, it is easy to learn enough to write simple Perl scripts.

Scenarios for Perl scripting with MARC records

Three scenarios where Perl scripting is used for cataloguing purposes:
    • Creating brief MARC records from delimited titles lists
    • Editing vendor-supplied MARC record files to adapt for local requirements
    • Deriving MARC records for ejournals based on the print version.

The Final report of the Program for Cooperative Cataloging’s Task Group on Journals in Aggregator Databases
(2000) provides a useful checklist of appropriate tags and content when scripting to either create or derive
MARC records. It lists proposed data elements for both machine-generated and machine-derived (i.e. from
existing print records) aggregator analytics records

Depending on whether there is an existing file of MARC records the records creation/manipulation process
steps are:

1.   Convert from MARC to tagged text using marc.pl or capture vendors delimited titles, ISSN, coverage file
1.   Edit tagged text using a locally written Perl script
2.   Create a globals tagged text file for fields, including a default holdings tag, to be added to each record
3.   Convert from tagged text to MARC using marc.pl
4.   Load resulting file of MARC records to the library system

Creating brief MARC records from delimited titles lists

When no MARC record set exists for an aggregated database, Perl scripts are used to parse delimited titles,
ISSN, coverage and URL information into MARC tagged text. The resulting tagged text file is then formatted
to MARC incorporating a global tagged text file using marc.pl to create as set of records.
In brief, all the Perl script has to do is to open the input file for reading, parse the information into the
appropriate fields, format it as tagged text and write the tags to an output file. This approach has been used to
create records for several databases including IDEAL, Emerald, Dow Jones Interactive and BlackwellScience.
For some publisher databases, fuller records with subject access have been created by adding one or more
subject heading terms for each title in the delimited titles file.

Appendix 1 shows the simple Perl script written to process Emerald records. Appendix 2 shows an example of
the resulting tagged text together with the global file used for Emerald.



Editing Vendor-supplied MARC records

Database vendors now make available files of records for their various aggregated databases. EBSCO
Publishing had undertaken a pilot project for the PCC Task Group on Aggregator Databases to derive records
for aggregated databases and their records are freely available to subscribers. When the University of South
Australia subscribed to the Ebsco MegaFile offer in late 1999, the availability of full MARC records was
regarded as a definite advantage. However these records required preprocessing to include UniSA-specific
information, change the supplied title level URLs to incorporate Digital Island access, and add a second URL
for off campus clients. Additional edits include changing GMD from [computer file] to [electronic journal] and
altering subject headings form subdivision coding from ‘x’ to ‘v’. Again to enable bulk deletion for
maintenance purposes, a tag to create a default holding was required.
The Perl scripts for these files do string pattern matching and substitution or [semi-global] find-and-replace
operations. In many cases, these changes could be done with a decent text editor with find/replace capabilities
and if dealing with the records on a one-off basis this is practical process. However aggregator databases are
notoriously volatile – changing content frequently – and hence the record sets need to be deleted and new files
downloaded from the vendor site, edited and loaded to the library system. So it’s worth spending a little time to
write a custom Perl editing script. Appendix 3 shows a script to edit Ebsco-sourced records.

Until mid-2000, Ebsco did not include publisher’s embargo periods in their MARC records but maintained a
separate embargoes page – hence further scripting to incorporate this information was needed. Vendor MARC
records are also available for the Gale and Proquest databases.

A variation of this process is also used to preprocess netLibrary MARC records – adding a default holding,
second remote authentication URL, and to edit the GMD.

Deriving MARC records for ejournals from print records
The third scenario where Perl scripts are used with MARC records is deriving records for the electronic version
from existing records. At UniSA we have reworked existing MARC records for print titles to create ejournal
records for APAIS FullText. No records were available as ejournals and as we already had print records for a
majority of titles, it was decided to rework these records into ejournal records. Title, ISSN and coverage
information was captured from the Informit site and edited into a spreadsheet. During the pre-subscription
evaluation process, APAIS FullText titles had been searched to the UniSA catalogue and bibkeys of existing
records noted. MARC records for these titles were exported from the catalogue as tagged text. For the titles not
held at UniSA, bibliographic records were captured to file from Kinetica and then converted to tagged text. The
ISSN and coverage data was also exported in tab-delimited format from the spreadsheet. By matching on ISSN,
the fulltext coverage information could be linked to each title and incorporated into the MARC record.

The records were edited following the PCC’s (2000) proposed data elements for machine-derived records –
deleting unwanted fields, adding and editing fields as needed. A globals file was used to add tag 006 and 007
data, tag 530 additional physical form note, a 590 local access information note, a 773 Host-item entry for the
database, a 710 for the vendor Informit and a default local holdings tag.

The Perl script to process records is longer than the earlier examples but no more complex – it just does more
deleting, updating and reworking. Appendix 4 shows an example of a print record for APAIS Fulltext – the
original print form, the edited form, the globals file and the final record as an ejournal.
Conclusion

While Perl is currently mostly used to deal with the challenges of providing and maintaining MARC records
for electronic resources, scripts are also used to post-process original cataloguing for all formats for batch
uploading to Kinetica. The uses of Perl in the cataloguer’s toolkit can be many and varied – it is a not-so-little
language that can and does! And it’s fun!



Appendix 1 – Perl script to edit Emerald titles file
# !/usr/local/bin/perl
# Script to edit Emerald tab-delimited title file into tagged text
# Entries contain Title, ISSN, Coverage and specific URL
# Written: Jenny Quilliam Revised: August 2001
# Command line >perl Emerald_RPA.pl [INPUT FILE] [OUTPUT FILE]
#
#################################################################################

$TheFile = shift;
$OutFile = shift;

open(INFILE, $TheFile) or die "Can't open Inputn";
open(OUTFILE, ">$OutFile") or die "Can't open Outputn";

# control structure to read and process each line from the input file
while (<INFILE>)
{
  s/"//g ;    #deleting any quote marks from the string
  $TheLine = $_ ;

 chomp($TheLine);

 #parsing the contents at the tab delimiters to populate the variables
  ($ISSN, $Title, $Coverage, $URL) = split(/t/, $TheLine);

 #printing out blank line between records
   print OUTFILE "n";

 # processing ISSN
     print OUTFILE "022             |a$ISSNn" ;

# processing Title - fixing filing indicators
# checking for leading The in Title
  if($Title =~ /^The /)
    {print OUTFILE "245 04|a$Title|h[electronic journal]n"; }
  else
    {print OUTFILE "245 00|a$Title|h[electronic journal]n";}

# processing to       generate URL tag with Coverage info
  print OUTFILE       "856 40|zFulltext from: $Coverage.";
  print OUTFILE       "This electronic journal is part of the Emerald database.";
  print OUTFILE       " Access within University network.|u$URLn";

# adding generic RPA URL link to all records
  print OUTFILE "856 41|zAccess outside University network.";
  print OUTFILE
"|uhttp://librpa.levels.unisa.edu.au/rpa/webauth.exe?rs=emeraldn";
}

close(INFILE);
close(OUTFILE);
Appendix 2 – Global and example of tagged text for Emerald titles
006 m      d
007 cr cn-
008 001123c19uu9999enkuu p         0   a0eng d
040   |aSUSA|beng|cSUSA
260   |a[Bradford, England :|bMCB University Press.]
530   |aOnline version of the print publication.
590   |aAvailable to University of South Australia staff and students. Access is
by direct login from computers within the University network or by authenticated
remote access. Articles available for downloading in PDF and HTML formats.
773 0 |tEmerald
991   |cEJ|nCAE|tNFL
___________________________________________________________________________
001 jaq00-05205
245 00|aAsia Pacific Journal of Marketing & Logistics|h[electronic journal]
022   |a0945-7517
856 40|zFulltext from: 1998. This electronic journal is part of the Emerald
library database. Access within University network.
|uhttp://www.emeraldinsight.com/094-57517.htm
856 41|zAccess outside University network.
|uhttp://librpa.levels.unisa.edu.au/rpa/webauth.exe?rs=emerald



Appendix 3 – Perl script to edit Ebsco sourced records
# !/usr/local/bin/perl
#
# Author: Jenny Quilliam            November 2000
#
# Program to edit EbscoHost records [as converted to text using marc.pl]
# GMD to be altered to: electronic journal
# Form subfield coding to be altered to v
# French subject headings to be deleted
# Fix URL to incorporate Digital Island access
# Command line string, takes 2 arguments:
# Command line:   mlx> perl EHedit.pl [input filename] [output filename]
#############################################################################

  $TheFile = shift;
  $OutFile = shift;
open(INFILE, $TheFile) or die "Can't open inputn";
open(OUTFILE, ">$OutFile") or die "Can't open outputn";
while (<INFILE>)
{
$TheLine = $_ ;
# processing selected lines only

# editing the GMD in the 245 field from [computer file] to [electronic journal]
 if($TheLine =~ /^245/) { $TheLine =~ s/computer file/electronic journal/g;} #

# editing subject headings to fix form subdivision subfield character
 if($TheLine =~ /^65/) { $TheLine =~ s/xPeriodicals/vPeriodicals/g;}

# editing out French subject headings
 if($TheLine =~ /^650 6/) {next}


# editing URL to add .global to string for Digital Island address
 if($TheLine =~ /^856/) {$TheLine =~ s/search.epnet/search.global.epnet/g ;}

print $TheLine;
print OUTFILE $TheLine;
}
close(INFILE);
close(OUTFILE);
Appendix 4 – APAIS FullText examples
Print record
LDR 00824nas 2200104 a 4500
 001 dup91000065
 008 820514c19739999vrabr p       0 0 0eng d
 022 0 $a0310-2939
 035    $a(atABN)2551638
 035    $u145182
 040    $dSIT$dSCAE
 043    $au-at---
 082 0 $a639.9$219
 245 00 $aHabitat Australia.
 259 00 $aLC$bP639.9 H116$cv.2, no.1 (Mar. 1974)-
 260 01 $aHawthorn, Vic. :$bAustralian Conservation Foundation,$c1973-
 300    $av. :$bill. (some col.), maps ;$c28 cm.
 362 0 $aVol. 1, no. 1 (June 1973)-
 580    $aAbsorbed Peace magazine Australia. Vol 15, no. 4 (Aug. 1987)
 650 0 $aNatural resources$xResearch$zAustralia.
 650 0 $aConservation of natural resources$zAustralia.
 710 20 $aAustralian Conservation Foundation.
 780 05 $tPeace magazine Australia$x0817-895X
 984    $a2036$cCIT PER 304.2 HAB v.1 (1973)-$cUND PER 304.2 HAB v.1 (1973)-$cMAG
PER 333.9506 H116 v.1 (1973)-$cSAL PER 333.705 H11 v.1 (1973)-
EndRecord

Edited record

LDR 00824nas 2200104 a 4500
001 jaq01-0607
008 820514c19739999vrabr p       0 0 0eng d
022 0 |a0310-2939
082 0 |a639.9|219
245 00|aHabitat Australia|h[electronic journal].
260   |aHawthorn, Vic. :|bAustralian Conservation Foundation,|c1973-
362 0 |aVol. 1, no. 1 (June 1973)-
580   |aAbsorbed Peace magazine Australia. Vol 15, no. 4 (Aug. 1987)
650 0|aNatural resources|xResearch|zAustralia.
650 0|aConservation of natural resources|zAustralia.
710 2 |aAustralian Conservation Foundation.
780 05|tPeace magazine Australia|x0817-895X
856 41|zSelected fulltext available: Vol. 24- (June 1996-) .Access via Australian
public affairs full text.|uhttp://www.informit.com.au
991   |cEJ|nCAE|tNFL

Globals file
006 m      d
007 cr anu
040   |aSUSA
530   |aOnline version of the print title.
590   |aAvailable to University of South Australia staff and students. Access is
by direct login from computers within the University network or by login and
password for remote users. File format and amount of fulltext content of journals
varies.
710 2 |aInformit.
773 0 |tAustralian public affairs full text|dMelbourne, Vic. : RMIT Publishing,
2000-.
991 |cEJ|nCAE|tNFL
References

Anderson, B., 1999, ‘Cataloging issues’ paper presented to Technical Services Librarians: the training we
need, the issues we face, PTPL Conference 1999. http://www.lib.virginia.edu/ptpl/anderson.html

Christiansen, T. & Torkington, N.,1998, Perl cookbook, O’Reilly, Sebastapol CA.

FLICC Personnel Working Group (2001) Sample KSAs for Librarian Positions: Catalogers
http://www.loc.gov/flicc/wg/ksa-cat.html

Hoffman, P. 1997, Perl 5 for dummies, IDG Books, Foster City CA.

Huggard, S. & Groenewegen, D., 2001, ‘E-data management: data access and cataloguing strategies to support
the Monash University virtual library’, LASIE, April 2001, p.25-42.

Library of Congress’s MARCBreaker and MARCMaker programs available at:
http://lcweb.loc.gov/marc/marc/marctools.html

Program for Cooperative Cataloging Task Group on Aggregator Databases, 2000, Final report.
http://lcweb/loc/gov/catdir/pcc/aggfinal.html

Reese, T. MarcEdit 3.0 program available at: http://ucs.orst.edu/~reeset/marcedit/index.html

Schwartz, R. & Christiansen, T. 1997, Learning Perl, 2nd ed., O’Reilly, Sebastapol CA.

Silver, Nik, Perl tutorial. http://fpg.uwaterloo.ca:80/perl/

Take 10 min to learn Perl http://www.geocities.com/SiliconValley/7331/ten_perl.html

Tennant, R. 1999 ‘Skills for the new millenium’, LJ Digital, January 1, 1999.
http://www.libraryjournal.com/articles/infotech/digitallibraries/19990101_412.htm

Thomas, S. marc.pl utility available at: http://www.library.adelaide.edu.au/~sthomas/scripts/

Using MARC.pm with batches of MARC records : the VIVA experience, 2000. [Online]
http://marcpm.sourceforeg.net/examples/viva.html




Author
Jenny Quilliam
Coordinator (Records)
Technical Services
University of South Australia Library
Email: jenny.quilliam@unisa.edu.au

Weitere ähnliche Inhalte

Andere mochten auch

Разбираемся с разницей между блогом, веб сайтом и
Разбираемся  с разницей между блогом, веб сайтом иРазбираемся  с разницей между блогом, веб сайтом и
Разбираемся с разницей между блогом, веб сайтом иЕлена
 
Build Comet applications using Scala, Lift, and &lt;b>jQuery&lt;/b>
Build Comet applications using Scala, Lift, and &lt;b>jQuery&lt;/b>Build Comet applications using Scala, Lift, and &lt;b>jQuery&lt;/b>
Build Comet applications using Scala, Lift, and &lt;b>jQuery&lt;/b>tutorialsruby
 
CSS-Tutorial-boxmodel
CSS-Tutorial-boxmodelCSS-Tutorial-boxmodel
CSS-Tutorial-boxmodeltutorialsruby
 

Andere mochten auch (7)

Разбираемся с разницей между блогом, веб сайтом и
Разбираемся  с разницей между блогом, веб сайтом иРазбираемся  с разницей между блогом, веб сайтом и
Разбираемся с разницей между блогом, веб сайтом и
 
2005%5CCOMP202-Sem1
2005%5CCOMP202-Sem12005%5CCOMP202-Sem1
2005%5CCOMP202-Sem1
 
CL2009_ANNIS_pre
CL2009_ANNIS_preCL2009_ANNIS_pre
CL2009_ANNIS_pre
 
Build Comet applications using Scala, Lift, and &lt;b>jQuery&lt;/b>
Build Comet applications using Scala, Lift, and &lt;b>jQuery&lt;/b>Build Comet applications using Scala, Lift, and &lt;b>jQuery&lt;/b>
Build Comet applications using Scala, Lift, and &lt;b>jQuery&lt;/b>
 
CSS-Tutorial-boxmodel
CSS-Tutorial-boxmodelCSS-Tutorial-boxmodel
CSS-Tutorial-boxmodel
 
Lezing #NAD Pedro de Bruyckere
Lezing #NAD Pedro de BruyckereLezing #NAD Pedro de Bruyckere
Lezing #NAD Pedro de Bruyckere
 
Pylons
PylonsPylons
Pylons
 

Ähnlich wie CatConf2001

Future of Cataloging, Classification
Future of Cataloging, ClassificationFuture of Cataloging, Classification
Future of Cataloging, ClassificationDenise Garofalo
 
ICON UK '13 - Apache Software: The FREE Java toolbox you didn't know you had !!
ICON UK '13 - Apache Software: The FREE Java toolbox you didn't know you had !!ICON UK '13 - Apache Software: The FREE Java toolbox you didn't know you had !!
ICON UK '13 - Apache Software: The FREE Java toolbox you didn't know you had !!panagenda
 
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment Kumprinx Amin
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
 
BIBFRAME, Linked data, RDA
BIBFRAME, Linked data, RDA BIBFRAME, Linked data, RDA
BIBFRAME, Linked data, RDA robin fay
 
Rdf Processing Tools In Java
Rdf Processing Tools In JavaRdf Processing Tools In Java
Rdf Processing Tools In JavaDicusarCorneliu
 
Ppt programming by alyssa marie paral
Ppt programming by alyssa marie paralPpt programming by alyssa marie paral
Ppt programming by alyssa marie paralalyssamarieparal
 
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...Terry Reese
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web workPaul Houle
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the HaystackAdrian Stevenson
 
OOP Comparative Study
OOP Comparative StudyOOP Comparative Study
OOP Comparative StudyDarren Tan
 
Intro_to_fswad_ppt_by_abhay (1).pptx
Intro_to_fswad_ppt_by_abhay (1).pptxIntro_to_fswad_ppt_by_abhay (1).pptx
Intro_to_fswad_ppt_by_abhay (1).pptxdipen55
 
Corpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsCorpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsJitendra Patil
 
Dublin Core Metadata Tutorial.ppt
Dublin Core Metadata Tutorial.pptDublin Core Metadata Tutorial.ppt
Dublin Core Metadata Tutorial.pptBharath Abbareddy
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...datascienceiqss
 
Evaluation criteria for nosql databases
Evaluation criteria for nosql databasesEvaluation criteria for nosql databases
Evaluation criteria for nosql databasesEbenezer Daniel
 
An Annotation Framework For The Semantic Web
An Annotation Framework For The Semantic WebAn Annotation Framework For The Semantic Web
An Annotation Framework For The Semantic WebAndrea Porter
 

Ähnlich wie CatConf2001 (20)

Future of Cataloging, Classification
Future of Cataloging, ClassificationFuture of Cataloging, Classification
Future of Cataloging, Classification
 
Web Spa
Web SpaWeb Spa
Web Spa
 
ICON UK '13 - Apache Software: The FREE Java toolbox you didn't know you had !!
ICON UK '13 - Apache Software: The FREE Java toolbox you didn't know you had !!ICON UK '13 - Apache Software: The FREE Java toolbox you didn't know you had !!
ICON UK '13 - Apache Software: The FREE Java toolbox you didn't know you had !!
 
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developments
 
BIBFRAME, Linked data, RDA
BIBFRAME, Linked data, RDA BIBFRAME, Linked data, RDA
BIBFRAME, Linked data, RDA
 
Rdf Processing Tools In Java
Rdf Processing Tools In JavaRdf Processing Tools In Java
Rdf Processing Tools In Java
 
Ppt programming by alyssa marie paral
Ppt programming by alyssa marie paralPpt programming by alyssa marie paral
Ppt programming by alyssa marie paral
 
Pearl
PearlPearl
Pearl
 
perl lauange
perl lauangeperl lauange
perl lauange
 
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web work
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
OOP Comparative Study
OOP Comparative StudyOOP Comparative Study
OOP Comparative Study
 
Intro_to_fswad_ppt_by_abhay (1).pptx
Intro_to_fswad_ppt_by_abhay (1).pptxIntro_to_fswad_ppt_by_abhay (1).pptx
Intro_to_fswad_ppt_by_abhay (1).pptx
 
Corpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsCorpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical Tools
 
Dublin Core Metadata Tutorial.ppt
Dublin Core Metadata Tutorial.pptDublin Core Metadata Tutorial.ppt
Dublin Core Metadata Tutorial.ppt
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 
Evaluation criteria for nosql databases
Evaluation criteria for nosql databasesEvaluation criteria for nosql databases
Evaluation criteria for nosql databases
 
An Annotation Framework For The Semantic Web
An Annotation Framework For The Semantic WebAn Annotation Framework For The Semantic Web
An Annotation Framework For The Semantic Web
 

Mehr von tutorialsruby

&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />tutorialsruby
 
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>TopStyle Help &amp; &lt;b>Tutorial&lt;/b>
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>tutorialsruby
 
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>tutorialsruby
 
&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />tutorialsruby
 
&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />tutorialsruby
 
Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0tutorialsruby
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269tutorialsruby
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269tutorialsruby
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008tutorialsruby
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008tutorialsruby
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheetstutorialsruby
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheetstutorialsruby
 

Mehr von tutorialsruby (20)

&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />
 
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>TopStyle Help &amp; &lt;b>Tutorial&lt;/b>
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>
 
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>
 
&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />
 
&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />
 
Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0
 
xhtml_basics
xhtml_basicsxhtml_basics
xhtml_basics
 
xhtml_basics
xhtml_basicsxhtml_basics
xhtml_basics
 
xhtml-documentation
xhtml-documentationxhtml-documentation
xhtml-documentation
 
xhtml-documentation
xhtml-documentationxhtml-documentation
xhtml-documentation
 
CSS
CSSCSS
CSS
 
CSS
CSSCSS
CSS
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
 
HowTo_CSS
HowTo_CSSHowTo_CSS
HowTo_CSS
 
HowTo_CSS
HowTo_CSSHowTo_CSS
HowTo_CSS
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheets
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheets
 

Kürzlich hochgeladen

Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 

Kürzlich hochgeladen (20)

Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 

CatConf2001

  • 1. I name thee Bay of Pe(a)rls : some practical virtues of Perl for cataloguers Jenny Quilliam Abstract With the increasing numbers of aggregated electronic resources, libraries now tend to ‘collect in batches’. These aggregated collections may not be permanent and are subject to frequent and significant frequent content changes. One survival strategy for Cataloguers is to ‘catalogue in batches’. While some publishers and vendors are now supplying files of MARC records for their aggregated resources, these often need to be adapted by libraries to include local authentication and access restriction information. Perl (Practical Extraction and Reporting Language) – is an easy to learn programming language which was designed to work with chunks of text – extracting, pattern matching / replacing, and reporting. MARC records are just long strings of highly formatted text and scripting with Perl is a practical way to edit fields, to add local information, change subfields, delete unwanted fields etc. – any find-and-replace or insert operation for which the algorithm can be defined. As cataloguers are already familiar with MARC coding and can define the algorithms, learning a bit of Perl means that cataloguers can easily add a few strings of Perls to their repertoire of skills Introduction In reviewing the literature on current and future roles for cataloguers, two major themes emerge: cataloguers need to be outcomes focussed and that new competencies are required to address the challenges in providing bibliographic access control for remote-access online resources. Electronic resources – primarily fulltext electronic journals and fulltext aggregated databases – have significantly improved libraries’ ability to deliver content to users regardless of time and distance. Integrated access means that the library catalogue must reflect all the resources that can be accessed especially those that are just a few clicks away. Macro cataloguing approaches are needed to deal with the proliferation of electronic resources and the high maintenance load caused by both long-term and temporary associated content volatility of these resources. In the United States, the Federal Library and Information Center Committee’s Personnel Working Group (2001) is developing Knowledge, Skills and Abilities statements for its various professional groups. For Catalogers, it has identified abilities including: • Ability to apply cataloging rules and adapt to changing rules and guidelines • Ability to accept and deal with ambiguity and make justifiable cataloguing decisions in the absence of clear-cut guidelines • Ability to create effective cataloging records where little or not precedent cataloguing exists Anderson (2000) argues that without decrying the importance of individual title cataloguing, macro- cataloguing approaches to manage large sets of records are essential. Responsibility for managing quality control, editing, loading, maintaining and unloading requires the “Geek Factor”. In a column which outlined skills required for librarians to manage digital collections, Tennant (1999) observed that while digital librarians do not need to be programmers, it is useful to know one’s way around a programming language and while the specific languages will vary a “general purpose language such as Perl can serve as a digital librarian’s Swiss Army knife – something that can perform a variety of tasks quickly and easily”.
  • 2. What is Perl and why is it useful? Perl is the acronym for Practical Extraction and Report Language. It is a high-level interpreted language optimized for scanning arbitrary text files and extracting, manipulating and reporting information from those text files. Unpacking this statement: • high-level = humans can read it • interpreted = doesn’t need to be compiled and is thus easier to debug and correct • text capabilities = Perl handles text in much the same way as people do Perl is a low cost – free - scripting language with very generous licensing provisions. To write a Perl script all you need is any text editor – e.g. Notepad or Arachnaphilia - as Perl scripts are just plain text files Perl is an outcomes focussed programming language – the ‘P” in Perl means practical and it is designed to get things done. This means that it is complete, easy to use and efficient. Perl uses sophisticated pattern-matching techniques to scan large amounts of data very quickly and it can do tasks that in other programming languages would be more complex, take longer to write, debug and test. There are often many ways to accomplish a task in Perl. Perl is optimized for text processing – and this is precisely what is required in creating editing and otherwise manipulating MARC records. A word of caution - while Perl is more forgiving that many other programming languages, there is a structure and syntax to be observed – in many ways familiar territory to cataloguers who deal with AACR2R and MARC rules, coding and syntax. Resources for learning Perl There are many how-to books on Perl. If you have no previous programming knowledge, two introductory texts are Paul Hoffman’s Perl 5 for dummies or Schwartz & Christiansen’s Learning Perl. Both are written in a gentle tutorial style, with comprehensive indexes and detailed tables of contents. Another useful resource is the Perl Cookbook, which contains around 1000 how-to-recipes for Perl – giving firstly the quick answer followed by a detailed discussion of the answer to the problem For online resources, an Internet search on the phrase ‘Perl tutorial’ yields pages of results. Two examples of beginner level tutorials are Take 10mins to learn Perl and Nik Silver’s Perl tutorial. How much Perl is needed to manipulate MARC records? The good news is “not a lot” – there are a number of tools available to deal with the more challenging intricacies of the MARC format – the directory structure and offsets, field and subfield terminators etc. These MARC editing tools (discussed below) allow you to deal with MARC records in a tagged text format rather than the single string. Not only is a tagged text format much easier to read (for humans) but it can be easily updated and manipulated using simple Perl scripts. Certainly to create a useful Perl script need to learn how to open files for reading and writing, something about control structures, conditionals and pattern matching and substitution. MARC record tools There are a range of MARC editing tools available for use and the Library of Congress maintains a listing of MARC Specialized Tools at: http://lcweb.loc.gov/marc/marctools.html MARCBreaker is a Library of Congress utility for converting MARC records into an ASCII text file format. It has a complimentary utility, MARCMaker, which can then be used to reformat from this file format into MARC records. The current version only runs under DOS and Windows 95/98. There is also a companion
  • 3. MarcEdit utility to MARCBreaker/MARCMaker developed by Terry Reese (2001). MarcEdit is currently in version 3.0 and has a number of useful editing features including global field addition and deletion. Simon Huggard and David Groenewegen in their paper ‘E-data management: data access and cataloguing strategies to support the Monash University virtual library’ outline the use of MARCBreaker and MARCMaker to edit record sets for various database aggregates. The Virtual University of Virginia (VIVA) has also used MARCMaker together with the MARC.pm module to convert and manipulate MARC records for electronic texts. MARC.pm is Perl 5 module for preprocessing, converting and manipulating MARC records. SourceForge maintains an informative website for MARC.pm that includes documentation with a few examples. It is a comprehensive module that can convert from MARC to ASCII, HTML, and XML and includes a number of ‘methods’ with options to create, delete and update MARC fields. Using MARC.pm requires a reasonable knowledge of Perl and general programming constructs. MARC.pm is used by the JAKE project to create MARC records. Michael Doran, University of Texas at Arlington, uses MARC.pm together with Perl scripts to preprocess MARC records for netLibrary. A description of this project can be found at: http://rocky.uta.edu/doran/preprocess/process.html marc.pl is a Perl module written by Steve Thomas from Adelaide University. It is a utility for extracting record from a file of MARC records, and converting records between standard MARC format and a tagged text representation and vice-versa from tagged text to MARC. One of the best features of this utility is the ability to add tags globally to each record by the use of a globals tagged text file. The marc.pl utility with documentation is available for download at: www.library.adelaide.edu.au/~sthomas/scripts/ It uses command line switches to specify the output format and options to include a global file or skip records. By default, marc.pl creates serial format MARC records, Leader ‘as’ so it is particularly suited to creating records for electronic journals in aggregated databases and publisher collections. The tagged text format required by marc.pl is simple – each field is on a separate line, the tag and indicator information is separated by a space and subfields are terminated with a single dagger delimiter. Records are separated by a blank line. To use marc.pl it is helpful to know what Perl is and this is why I first dived [paddled is probably a more accurate verb] into the world of Perl. Once in though, it is easy to learn enough to write simple Perl scripts. Scenarios for Perl scripting with MARC records Three scenarios where Perl scripting is used for cataloguing purposes: • Creating brief MARC records from delimited titles lists • Editing vendor-supplied MARC record files to adapt for local requirements • Deriving MARC records for ejournals based on the print version. The Final report of the Program for Cooperative Cataloging’s Task Group on Journals in Aggregator Databases (2000) provides a useful checklist of appropriate tags and content when scripting to either create or derive MARC records. It lists proposed data elements for both machine-generated and machine-derived (i.e. from existing print records) aggregator analytics records Depending on whether there is an existing file of MARC records the records creation/manipulation process steps are: 1. Convert from MARC to tagged text using marc.pl or capture vendors delimited titles, ISSN, coverage file 1. Edit tagged text using a locally written Perl script 2. Create a globals tagged text file for fields, including a default holdings tag, to be added to each record 3. Convert from tagged text to MARC using marc.pl 4. Load resulting file of MARC records to the library system Creating brief MARC records from delimited titles lists When no MARC record set exists for an aggregated database, Perl scripts are used to parse delimited titles, ISSN, coverage and URL information into MARC tagged text. The resulting tagged text file is then formatted to MARC incorporating a global tagged text file using marc.pl to create as set of records.
  • 4. In brief, all the Perl script has to do is to open the input file for reading, parse the information into the appropriate fields, format it as tagged text and write the tags to an output file. This approach has been used to create records for several databases including IDEAL, Emerald, Dow Jones Interactive and BlackwellScience. For some publisher databases, fuller records with subject access have been created by adding one or more subject heading terms for each title in the delimited titles file. Appendix 1 shows the simple Perl script written to process Emerald records. Appendix 2 shows an example of the resulting tagged text together with the global file used for Emerald. Editing Vendor-supplied MARC records Database vendors now make available files of records for their various aggregated databases. EBSCO Publishing had undertaken a pilot project for the PCC Task Group on Aggregator Databases to derive records for aggregated databases and their records are freely available to subscribers. When the University of South Australia subscribed to the Ebsco MegaFile offer in late 1999, the availability of full MARC records was regarded as a definite advantage. However these records required preprocessing to include UniSA-specific information, change the supplied title level URLs to incorporate Digital Island access, and add a second URL for off campus clients. Additional edits include changing GMD from [computer file] to [electronic journal] and altering subject headings form subdivision coding from ‘x’ to ‘v’. Again to enable bulk deletion for maintenance purposes, a tag to create a default holding was required. The Perl scripts for these files do string pattern matching and substitution or [semi-global] find-and-replace operations. In many cases, these changes could be done with a decent text editor with find/replace capabilities and if dealing with the records on a one-off basis this is practical process. However aggregator databases are notoriously volatile – changing content frequently – and hence the record sets need to be deleted and new files downloaded from the vendor site, edited and loaded to the library system. So it’s worth spending a little time to write a custom Perl editing script. Appendix 3 shows a script to edit Ebsco-sourced records. Until mid-2000, Ebsco did not include publisher’s embargo periods in their MARC records but maintained a separate embargoes page – hence further scripting to incorporate this information was needed. Vendor MARC records are also available for the Gale and Proquest databases. A variation of this process is also used to preprocess netLibrary MARC records – adding a default holding, second remote authentication URL, and to edit the GMD. Deriving MARC records for ejournals from print records The third scenario where Perl scripts are used with MARC records is deriving records for the electronic version from existing records. At UniSA we have reworked existing MARC records for print titles to create ejournal records for APAIS FullText. No records were available as ejournals and as we already had print records for a majority of titles, it was decided to rework these records into ejournal records. Title, ISSN and coverage information was captured from the Informit site and edited into a spreadsheet. During the pre-subscription evaluation process, APAIS FullText titles had been searched to the UniSA catalogue and bibkeys of existing records noted. MARC records for these titles were exported from the catalogue as tagged text. For the titles not held at UniSA, bibliographic records were captured to file from Kinetica and then converted to tagged text. The ISSN and coverage data was also exported in tab-delimited format from the spreadsheet. By matching on ISSN, the fulltext coverage information could be linked to each title and incorporated into the MARC record. The records were edited following the PCC’s (2000) proposed data elements for machine-derived records – deleting unwanted fields, adding and editing fields as needed. A globals file was used to add tag 006 and 007 data, tag 530 additional physical form note, a 590 local access information note, a 773 Host-item entry for the database, a 710 for the vendor Informit and a default local holdings tag. The Perl script to process records is longer than the earlier examples but no more complex – it just does more deleting, updating and reworking. Appendix 4 shows an example of a print record for APAIS Fulltext – the original print form, the edited form, the globals file and the final record as an ejournal.
  • 5. Conclusion While Perl is currently mostly used to deal with the challenges of providing and maintaining MARC records for electronic resources, scripts are also used to post-process original cataloguing for all formats for batch uploading to Kinetica. The uses of Perl in the cataloguer’s toolkit can be many and varied – it is a not-so-little language that can and does! And it’s fun! Appendix 1 – Perl script to edit Emerald titles file # !/usr/local/bin/perl # Script to edit Emerald tab-delimited title file into tagged text # Entries contain Title, ISSN, Coverage and specific URL # Written: Jenny Quilliam Revised: August 2001 # Command line >perl Emerald_RPA.pl [INPUT FILE] [OUTPUT FILE] # ################################################################################# $TheFile = shift; $OutFile = shift; open(INFILE, $TheFile) or die "Can't open Inputn"; open(OUTFILE, ">$OutFile") or die "Can't open Outputn"; # control structure to read and process each line from the input file while (<INFILE>) { s/"//g ; #deleting any quote marks from the string $TheLine = $_ ; chomp($TheLine); #parsing the contents at the tab delimiters to populate the variables ($ISSN, $Title, $Coverage, $URL) = split(/t/, $TheLine); #printing out blank line between records print OUTFILE "n"; # processing ISSN print OUTFILE "022 |a$ISSNn" ; # processing Title - fixing filing indicators # checking for leading The in Title if($Title =~ /^The /) {print OUTFILE "245 04|a$Title|h[electronic journal]n"; } else {print OUTFILE "245 00|a$Title|h[electronic journal]n";} # processing to generate URL tag with Coverage info print OUTFILE "856 40|zFulltext from: $Coverage."; print OUTFILE "This electronic journal is part of the Emerald database."; print OUTFILE " Access within University network.|u$URLn"; # adding generic RPA URL link to all records print OUTFILE "856 41|zAccess outside University network."; print OUTFILE "|uhttp://librpa.levels.unisa.edu.au/rpa/webauth.exe?rs=emeraldn"; } close(INFILE); close(OUTFILE);
  • 6. Appendix 2 – Global and example of tagged text for Emerald titles 006 m d 007 cr cn- 008 001123c19uu9999enkuu p 0 a0eng d 040 |aSUSA|beng|cSUSA 260 |a[Bradford, England :|bMCB University Press.] 530 |aOnline version of the print publication. 590 |aAvailable to University of South Australia staff and students. Access is by direct login from computers within the University network or by authenticated remote access. Articles available for downloading in PDF and HTML formats. 773 0 |tEmerald 991 |cEJ|nCAE|tNFL ___________________________________________________________________________ 001 jaq00-05205 245 00|aAsia Pacific Journal of Marketing & Logistics|h[electronic journal] 022 |a0945-7517 856 40|zFulltext from: 1998. This electronic journal is part of the Emerald library database. Access within University network. |uhttp://www.emeraldinsight.com/094-57517.htm 856 41|zAccess outside University network. |uhttp://librpa.levels.unisa.edu.au/rpa/webauth.exe?rs=emerald Appendix 3 – Perl script to edit Ebsco sourced records # !/usr/local/bin/perl # # Author: Jenny Quilliam November 2000 # # Program to edit EbscoHost records [as converted to text using marc.pl] # GMD to be altered to: electronic journal # Form subfield coding to be altered to v # French subject headings to be deleted # Fix URL to incorporate Digital Island access # Command line string, takes 2 arguments: # Command line: mlx> perl EHedit.pl [input filename] [output filename] ############################################################################# $TheFile = shift; $OutFile = shift; open(INFILE, $TheFile) or die "Can't open inputn"; open(OUTFILE, ">$OutFile") or die "Can't open outputn"; while (<INFILE>) { $TheLine = $_ ; # processing selected lines only # editing the GMD in the 245 field from [computer file] to [electronic journal] if($TheLine =~ /^245/) { $TheLine =~ s/computer file/electronic journal/g;} # # editing subject headings to fix form subdivision subfield character if($TheLine =~ /^65/) { $TheLine =~ s/xPeriodicals/vPeriodicals/g;} # editing out French subject headings if($TheLine =~ /^650 6/) {next} # editing URL to add .global to string for Digital Island address if($TheLine =~ /^856/) {$TheLine =~ s/search.epnet/search.global.epnet/g ;} print $TheLine; print OUTFILE $TheLine; } close(INFILE); close(OUTFILE);
  • 7. Appendix 4 – APAIS FullText examples Print record LDR 00824nas 2200104 a 4500 001 dup91000065 008 820514c19739999vrabr p 0 0 0eng d 022 0 $a0310-2939 035 $a(atABN)2551638 035 $u145182 040 $dSIT$dSCAE 043 $au-at--- 082 0 $a639.9$219 245 00 $aHabitat Australia. 259 00 $aLC$bP639.9 H116$cv.2, no.1 (Mar. 1974)- 260 01 $aHawthorn, Vic. :$bAustralian Conservation Foundation,$c1973- 300 $av. :$bill. (some col.), maps ;$c28 cm. 362 0 $aVol. 1, no. 1 (June 1973)- 580 $aAbsorbed Peace magazine Australia. Vol 15, no. 4 (Aug. 1987) 650 0 $aNatural resources$xResearch$zAustralia. 650 0 $aConservation of natural resources$zAustralia. 710 20 $aAustralian Conservation Foundation. 780 05 $tPeace magazine Australia$x0817-895X 984 $a2036$cCIT PER 304.2 HAB v.1 (1973)-$cUND PER 304.2 HAB v.1 (1973)-$cMAG PER 333.9506 H116 v.1 (1973)-$cSAL PER 333.705 H11 v.1 (1973)- EndRecord Edited record LDR 00824nas 2200104 a 4500 001 jaq01-0607 008 820514c19739999vrabr p 0 0 0eng d 022 0 |a0310-2939 082 0 |a639.9|219 245 00|aHabitat Australia|h[electronic journal]. 260 |aHawthorn, Vic. :|bAustralian Conservation Foundation,|c1973- 362 0 |aVol. 1, no. 1 (June 1973)- 580 |aAbsorbed Peace magazine Australia. Vol 15, no. 4 (Aug. 1987) 650 0|aNatural resources|xResearch|zAustralia. 650 0|aConservation of natural resources|zAustralia. 710 2 |aAustralian Conservation Foundation. 780 05|tPeace magazine Australia|x0817-895X 856 41|zSelected fulltext available: Vol. 24- (June 1996-) .Access via Australian public affairs full text.|uhttp://www.informit.com.au 991 |cEJ|nCAE|tNFL Globals file 006 m d 007 cr anu 040 |aSUSA 530 |aOnline version of the print title. 590 |aAvailable to University of South Australia staff and students. Access is by direct login from computers within the University network or by login and password for remote users. File format and amount of fulltext content of journals varies. 710 2 |aInformit. 773 0 |tAustralian public affairs full text|dMelbourne, Vic. : RMIT Publishing, 2000-. 991 |cEJ|nCAE|tNFL
  • 8. References Anderson, B., 1999, ‘Cataloging issues’ paper presented to Technical Services Librarians: the training we need, the issues we face, PTPL Conference 1999. http://www.lib.virginia.edu/ptpl/anderson.html Christiansen, T. & Torkington, N.,1998, Perl cookbook, O’Reilly, Sebastapol CA. FLICC Personnel Working Group (2001) Sample KSAs for Librarian Positions: Catalogers http://www.loc.gov/flicc/wg/ksa-cat.html Hoffman, P. 1997, Perl 5 for dummies, IDG Books, Foster City CA. Huggard, S. & Groenewegen, D., 2001, ‘E-data management: data access and cataloguing strategies to support the Monash University virtual library’, LASIE, April 2001, p.25-42. Library of Congress’s MARCBreaker and MARCMaker programs available at: http://lcweb.loc.gov/marc/marc/marctools.html Program for Cooperative Cataloging Task Group on Aggregator Databases, 2000, Final report. http://lcweb/loc/gov/catdir/pcc/aggfinal.html Reese, T. MarcEdit 3.0 program available at: http://ucs.orst.edu/~reeset/marcedit/index.html Schwartz, R. & Christiansen, T. 1997, Learning Perl, 2nd ed., O’Reilly, Sebastapol CA. Silver, Nik, Perl tutorial. http://fpg.uwaterloo.ca:80/perl/ Take 10 min to learn Perl http://www.geocities.com/SiliconValley/7331/ten_perl.html Tennant, R. 1999 ‘Skills for the new millenium’, LJ Digital, January 1, 1999. http://www.libraryjournal.com/articles/infotech/digitallibraries/19990101_412.htm Thomas, S. marc.pl utility available at: http://www.library.adelaide.edu.au/~sthomas/scripts/ Using MARC.pm with batches of MARC records : the VIVA experience, 2000. [Online] http://marcpm.sourceforeg.net/examples/viva.html Author Jenny Quilliam Coordinator (Records) Technical Services University of South Australia Library Email: jenny.quilliam@unisa.edu.au