This is an archive on a webinar delivered on January 12, 2012. Description: If you’re really new to cataloging, this session is for you. In this 90-minute online session, facilitated by NEKLS technology librarian Heather Braum, you will:
learn the basic principles behind cataloging,
discover why librarians catalog,
learn to read a basic MARC record,
see what a good MARC record looks like,
learn basic cataloging terminology,
and practice describing different materials.
Special thanks to Robin Fay for allowing me to use a couple of the ideas shared in this webinar and presentation. See her outstanding slides: http://www.slideshare.net/robinfay/cataloging-basics-presentation.
Library classification involves arranging books and materials in a logical order to help users find what they need easily. It can be done through enumerative systems that list subjects alphabetically and assign numbers, hierarchical systems that divide subjects from general to specific, or faceted systems that break down subjects into orthogonal components. The key goals of classification are to provide a helpful arrangement, allow for revisions to accommodate new topics, and make the system simple for users to understand and apply.
The first part of a day-long presentation made on November 3, 2009, covering various aspects of library cataloging, MARC records, FRBR, RDA, authority control, etc.
O documento descreve a história e os conceitos do controle bibliográfico, desde a antiguidade até os modelos atuais. Aborda a evolução das bibliotecas e bibliografias, a institucionalização do controle bibliográfico e o modelo proposto pela UNESCO de Controle Bibliográfico Universal. Também discute o papel da tecnologia e da internet para tornar a informação bibliográfica mais acessível.
The document discusses the history and features of the 23rd edition of the Dewey Decimal Classification system. It provides details on the system's development since 1876, its structure involving 10 main classes and use of decimals, and new features in the 23rd edition like representation of groups of people, revisions to standard subdivisions, and changes to better organize knowledge on the internet.
The document provides a history of school library media programs, tracing the evolution of the concept from simple book repositories to full-fledged instructional centers integrating various media and resources. It describes how influential reports in 1945, 1960, and 1969 established standards and definitions. The document also outlines three "revolutions" that modernized school libraries beginning in the late 1940s by adding audiovisual materials, integrating instruction, and promoting active participation in teaching. Subsequent guidelines in 1988, 1998, and 2009 further advanced the role of the library media specialist in curriculum development and ensuring students become information literate.
This document provides an overview of cataloging and descriptive cataloging according to AACR2 standards. It discusses the key elements and areas of a bibliographic record, including:
1) The title and statement of responsibility area, which includes the title proper, parallel titles, other title information, and statements of responsibility.
2) Additional areas like edition, publication details, physical description, and notes.
3) The use of punctuation and layout conventions to distinguish between these different elements according to cataloging rules. The goal is to uniquely identify and describe items so they can be found by library users.
Seminar on current awareness service and selective disseminitionDILEEP_DS
SDI is a type of current awareness service meant to keep users abreast with the latest developments in their fields of interest. It is a personalized service that provides pinpointed information to individual users or groups based on their predefined information needs. SDI involves reviewing documents, selecting relevant information, and notifying users of matching items so they can stay well-informed on topics important to their work.
This is an archive on a webinar delivered on January 12, 2012. Description: If you’re really new to cataloging, this session is for you. In this 90-minute online session, facilitated by NEKLS technology librarian Heather Braum, you will:
learn the basic principles behind cataloging,
discover why librarians catalog,
learn to read a basic MARC record,
see what a good MARC record looks like,
learn basic cataloging terminology,
and practice describing different materials.
Special thanks to Robin Fay for allowing me to use a couple of the ideas shared in this webinar and presentation. See her outstanding slides: http://www.slideshare.net/robinfay/cataloging-basics-presentation.
Library classification involves arranging books and materials in a logical order to help users find what they need easily. It can be done through enumerative systems that list subjects alphabetically and assign numbers, hierarchical systems that divide subjects from general to specific, or faceted systems that break down subjects into orthogonal components. The key goals of classification are to provide a helpful arrangement, allow for revisions to accommodate new topics, and make the system simple for users to understand and apply.
The first part of a day-long presentation made on November 3, 2009, covering various aspects of library cataloging, MARC records, FRBR, RDA, authority control, etc.
O documento descreve a história e os conceitos do controle bibliográfico, desde a antiguidade até os modelos atuais. Aborda a evolução das bibliotecas e bibliografias, a institucionalização do controle bibliográfico e o modelo proposto pela UNESCO de Controle Bibliográfico Universal. Também discute o papel da tecnologia e da internet para tornar a informação bibliográfica mais acessível.
The document discusses the history and features of the 23rd edition of the Dewey Decimal Classification system. It provides details on the system's development since 1876, its structure involving 10 main classes and use of decimals, and new features in the 23rd edition like representation of groups of people, revisions to standard subdivisions, and changes to better organize knowledge on the internet.
The document provides a history of school library media programs, tracing the evolution of the concept from simple book repositories to full-fledged instructional centers integrating various media and resources. It describes how influential reports in 1945, 1960, and 1969 established standards and definitions. The document also outlines three "revolutions" that modernized school libraries beginning in the late 1940s by adding audiovisual materials, integrating instruction, and promoting active participation in teaching. Subsequent guidelines in 1988, 1998, and 2009 further advanced the role of the library media specialist in curriculum development and ensuring students become information literate.
This document provides an overview of cataloging and descriptive cataloging according to AACR2 standards. It discusses the key elements and areas of a bibliographic record, including:
1) The title and statement of responsibility area, which includes the title proper, parallel titles, other title information, and statements of responsibility.
2) Additional areas like edition, publication details, physical description, and notes.
3) The use of punctuation and layout conventions to distinguish between these different elements according to cataloging rules. The goal is to uniquely identify and describe items so they can be found by library users.
Seminar on current awareness service and selective disseminitionDILEEP_DS
SDI is a type of current awareness service meant to keep users abreast with the latest developments in their fields of interest. It is a personalized service that provides pinpointed information to individual users or groups based on their predefined information needs. SDI involves reviewing documents, selecting relevant information, and notifying users of matching items so they can stay well-informed on topics important to their work.
The document discusses the objectives, purposes, and functions of a library catalogue. It defines a library catalogue as a list of print and non-print materials accessible from a particular library. The main purposes of a library catalogue are to serve as a guide to the library's collection and to aid users in locating materials. An effective catalogue should enable users to find materials by author, title, subject, and other access points. The cataloging process involves preparing bibliographic records that describe materials and provide standardized subject headings and classifications.
This document discusses the cataloging of computer software. It defines electronic resources according to AACR2R and notes that direct access resources come in containers to be used with a computer, while remote access resources are available online. It provides guidance on choosing the main entry, transcribing titles and statements of responsibility, and notes various areas of the catalog record including physical description, accompanying materials, and notes.
presentation on "CATALOGUING" during Training workshop in library science for staff of muktangan school libraries organised by muktangan school teacher reference library, mumbai on 15th November 2010
Bibliographic control and library automation have evolved significantly over time. Standardized formats like MARC have facilitated processing and cataloging workflows. The Library of Congress and bibliographic utilities like OCLC have played key roles in developing shared bibliographic databases and standards. While some libraries conduct original cataloging, many engage in copy cataloging and leverage records from these central sources. Centralized and cooperative approaches help improve efficiency.
This document proposes automating the library at NTU FSD. It discusses the need for library automation to improve access and services. The objectives of automation include maintaining bibliographic records, providing catalog access, and implementing new IT processes. Selection criteria for an integrated library system include functionality, user interface, standards support, scalability, and costs. The proposal recommends analyzing needs, developing criteria, evaluating systems, and issuing a request for proposal to potential vendors. The implementation process involves strategic planning, data conversion, pilot testing, and post-implementation review.
This document provides guidance on creating a personal learning network (PLN) using various online tools and platforms. It discusses how to identify and connect with experts in your field on Twitter, blogs, and social bookmarking services. It also explains how to maintain your PLN by sharing knowledge and experience. The overall goal of a PLN is to guide independent learning and professional development through curating a network of online resources and connections.
Library automation refers to the application of computers and related technologies to perform traditional library operations such as acquisition, cataloguing, circulation, serials control and reference services. The key objectives of library automation are to improve control over collections, provide effective access to resources and share resources among libraries. Some advantages include increased efficiency of operations, improved access to information and ability to share resources. Challenges include initial costs, need for training staff and keeping systems up to date with new technologies. Current trends in library automation include web-based library management systems, mobile technologies and cloud computing.
The document discusses digital reference services provided by libraries. It defines digital reference as reference services provided electronically over the internet through means like email, web forms, and chat. The rise of digital reference is due to more people accessing library resources online and needing information anytime, anywhere. Digital reference aims to identify user needs, develop search strategies, and satisfy users with authoritative information. It allows remote access and expanded service hours. Common forms of digital reference include email, web forms, chat applications, instant messaging, and video. Libraries must train staff, design interfaces, test services, and address legal and quality issues to effectively provide digital reference.
Features of the Dewey Decimal Classification. 16. Decimal ... The UDC is peculiar in the sense that it consists of a combination of both enumerative and analytical scheme.
This document discusses a Keyword Out of Context (KWOC) index, where keywords are extracted from titles and displayed as headings, with an asterisk replacing the keyword in the title. A KWOC index rotates keywords to the left margin to serve as access points, similar to subject headings. Entries under each keyword contain the title and page number but replace the keyword with an asterisk or symbol in the title. An example of a KWOC index is provided.
1. The Nepal National Library was established in 1957 with a collection of 30,000 manuscripts and printed books purchased from a private collection. It was initially housed in Singha Durbar with limited space and outdated furniture.
2. By the 1950s, with the advent of democracy and growing interest in education, the need for public libraries and library services was felt. Existing academic libraries played an important role in collecting books to support the growing number of schools and colleges.
3. The establishment of diplomatic relations with other countries after 1951 resulted in the opening of missionary libraries like the American Library and Indian Library to promote cultural relations and information exchange.
This document provides an introduction and overview of Resource Description and Access (RDA), the new cataloging standard that replaces Anglo-American Cataloguing Rules (AACR). RDA is designed for the digital age and is based on Functional Requirements for Bibliographic Records (FRBR) and Functional Requirements for Authority Data (FRAD). RDA provides more flexibility and is compatible with current metadata standards and encoding formats like MARC. While RDA has some advantages, there are also ongoing considerations and discussions around its implementation.
An article on how to manage special libraries.
Includes:
- Aspects in special library management
- Problems, challenges and opportunities involved in managing a special library
Course: LIBSCI 36 - Special/Public Librarianship
Teacher: Elizabeth Banlat
The document summarizes the historical development of library automation from the 1930s to present. It discusses the early experimental phase using technologies like punched cards. The local systems phase in the 1960s-1970s saw the first application of general purpose computers to offline library systems. The cooperative systems phase beginning in 1970 featured the growth of online systems and library networks for resource sharing. Library automation has since developed further with the rise of the internet, online public access catalogs, and other digital technologies.
This document discusses library automation for serial management. It begins with definitions of serials as publications intended to be indefinitely continuing, such as magazines, newspapers, and journals. It then outlines the complex procedures required to manage serial collections and how automation can help address issues like tracking missing issues and claims. The document details the key components and functions needed in an automated serials control system, including the bibliographic database, searching and access capabilities, and automated support for selection, acquisition, check-in, routing, and other processes.
Este documento proporciona una introducción al formato MARC (Machine Readable Cataloging) diseñado para almacenar información bibliográfica. Explica que un registro MARC contiene etiquetas, indicadores y campos de datos e incluye ejemplos de campos comunes como el número de control, fecha de publicación e información física. Además, define términos clave como etiquetas, indicadores, códigos de subcampo y los componentes principales de un registro MARC.
This document provides guidelines for transcribing titles, statements of responsibility, and other elements from the chief source of information for catalog records. Key points include:
- Transcribe titles exactly as they appear on the chief source, omitting unnecessary punctuation. Supply translations for titles in other languages.
- Indicate general material designations, parallel titles, and other title information in a standardized way.
- Transcribe statements of responsibility prominently displayed on the item. Supply responsibility statements from other sources in brackets.
- Follow specific rules for abbreviating, punctuating, and formatting titles and related elements to ensure consistency across records.
The document discusses the concepts of cataloging, including:
- Original cataloging is creating records from scratch while copy cataloging adapts existing records
- Cataloging involves bibliographic description, subject analysis, classification, and physical preparation
- Standards like ISBD and AACR2 provide rules for cataloging to ensure consistency
- FRBR and RDA aim to update cataloging standards for the digital era
This document discusses RDA, FRBR, and FRAD and how they connect cataloging principles and standards. It provides background on FRBR and FRAD and their conceptual models of bibliographic resources and relationships. It then explains how RDA is based on FRBR and FRAD principles and is designed for the digital environment. Key differences between RDA and AACR2 are outlined such as a broader scope, being principle-based rather than rule-based, and emphasizing user tasks. Implementation plans target the first quarter of 2013 for major libraries to transition to RDA.
The document discusses the objectives, purposes, and functions of a library catalogue. It defines a library catalogue as a list of print and non-print materials accessible from a particular library. The main purposes of a library catalogue are to serve as a guide to the library's collection and to aid users in locating materials. An effective catalogue should enable users to find materials by author, title, subject, and other access points. The cataloging process involves preparing bibliographic records that describe materials and provide standardized subject headings and classifications.
This document discusses the cataloging of computer software. It defines electronic resources according to AACR2R and notes that direct access resources come in containers to be used with a computer, while remote access resources are available online. It provides guidance on choosing the main entry, transcribing titles and statements of responsibility, and notes various areas of the catalog record including physical description, accompanying materials, and notes.
presentation on "CATALOGUING" during Training workshop in library science for staff of muktangan school libraries organised by muktangan school teacher reference library, mumbai on 15th November 2010
Bibliographic control and library automation have evolved significantly over time. Standardized formats like MARC have facilitated processing and cataloging workflows. The Library of Congress and bibliographic utilities like OCLC have played key roles in developing shared bibliographic databases and standards. While some libraries conduct original cataloging, many engage in copy cataloging and leverage records from these central sources. Centralized and cooperative approaches help improve efficiency.
This document proposes automating the library at NTU FSD. It discusses the need for library automation to improve access and services. The objectives of automation include maintaining bibliographic records, providing catalog access, and implementing new IT processes. Selection criteria for an integrated library system include functionality, user interface, standards support, scalability, and costs. The proposal recommends analyzing needs, developing criteria, evaluating systems, and issuing a request for proposal to potential vendors. The implementation process involves strategic planning, data conversion, pilot testing, and post-implementation review.
This document provides guidance on creating a personal learning network (PLN) using various online tools and platforms. It discusses how to identify and connect with experts in your field on Twitter, blogs, and social bookmarking services. It also explains how to maintain your PLN by sharing knowledge and experience. The overall goal of a PLN is to guide independent learning and professional development through curating a network of online resources and connections.
Library automation refers to the application of computers and related technologies to perform traditional library operations such as acquisition, cataloguing, circulation, serials control and reference services. The key objectives of library automation are to improve control over collections, provide effective access to resources and share resources among libraries. Some advantages include increased efficiency of operations, improved access to information and ability to share resources. Challenges include initial costs, need for training staff and keeping systems up to date with new technologies. Current trends in library automation include web-based library management systems, mobile technologies and cloud computing.
The document discusses digital reference services provided by libraries. It defines digital reference as reference services provided electronically over the internet through means like email, web forms, and chat. The rise of digital reference is due to more people accessing library resources online and needing information anytime, anywhere. Digital reference aims to identify user needs, develop search strategies, and satisfy users with authoritative information. It allows remote access and expanded service hours. Common forms of digital reference include email, web forms, chat applications, instant messaging, and video. Libraries must train staff, design interfaces, test services, and address legal and quality issues to effectively provide digital reference.
Features of the Dewey Decimal Classification. 16. Decimal ... The UDC is peculiar in the sense that it consists of a combination of both enumerative and analytical scheme.
This document discusses a Keyword Out of Context (KWOC) index, where keywords are extracted from titles and displayed as headings, with an asterisk replacing the keyword in the title. A KWOC index rotates keywords to the left margin to serve as access points, similar to subject headings. Entries under each keyword contain the title and page number but replace the keyword with an asterisk or symbol in the title. An example of a KWOC index is provided.
1. The Nepal National Library was established in 1957 with a collection of 30,000 manuscripts and printed books purchased from a private collection. It was initially housed in Singha Durbar with limited space and outdated furniture.
2. By the 1950s, with the advent of democracy and growing interest in education, the need for public libraries and library services was felt. Existing academic libraries played an important role in collecting books to support the growing number of schools and colleges.
3. The establishment of diplomatic relations with other countries after 1951 resulted in the opening of missionary libraries like the American Library and Indian Library to promote cultural relations and information exchange.
This document provides an introduction and overview of Resource Description and Access (RDA), the new cataloging standard that replaces Anglo-American Cataloguing Rules (AACR). RDA is designed for the digital age and is based on Functional Requirements for Bibliographic Records (FRBR) and Functional Requirements for Authority Data (FRAD). RDA provides more flexibility and is compatible with current metadata standards and encoding formats like MARC. While RDA has some advantages, there are also ongoing considerations and discussions around its implementation.
An article on how to manage special libraries.
Includes:
- Aspects in special library management
- Problems, challenges and opportunities involved in managing a special library
Course: LIBSCI 36 - Special/Public Librarianship
Teacher: Elizabeth Banlat
The document summarizes the historical development of library automation from the 1930s to present. It discusses the early experimental phase using technologies like punched cards. The local systems phase in the 1960s-1970s saw the first application of general purpose computers to offline library systems. The cooperative systems phase beginning in 1970 featured the growth of online systems and library networks for resource sharing. Library automation has since developed further with the rise of the internet, online public access catalogs, and other digital technologies.
This document discusses library automation for serial management. It begins with definitions of serials as publications intended to be indefinitely continuing, such as magazines, newspapers, and journals. It then outlines the complex procedures required to manage serial collections and how automation can help address issues like tracking missing issues and claims. The document details the key components and functions needed in an automated serials control system, including the bibliographic database, searching and access capabilities, and automated support for selection, acquisition, check-in, routing, and other processes.
Este documento proporciona una introducción al formato MARC (Machine Readable Cataloging) diseñado para almacenar información bibliográfica. Explica que un registro MARC contiene etiquetas, indicadores y campos de datos e incluye ejemplos de campos comunes como el número de control, fecha de publicación e información física. Además, define términos clave como etiquetas, indicadores, códigos de subcampo y los componentes principales de un registro MARC.
This document provides guidelines for transcribing titles, statements of responsibility, and other elements from the chief source of information for catalog records. Key points include:
- Transcribe titles exactly as they appear on the chief source, omitting unnecessary punctuation. Supply translations for titles in other languages.
- Indicate general material designations, parallel titles, and other title information in a standardized way.
- Transcribe statements of responsibility prominently displayed on the item. Supply responsibility statements from other sources in brackets.
- Follow specific rules for abbreviating, punctuating, and formatting titles and related elements to ensure consistency across records.
The document discusses the concepts of cataloging, including:
- Original cataloging is creating records from scratch while copy cataloging adapts existing records
- Cataloging involves bibliographic description, subject analysis, classification, and physical preparation
- Standards like ISBD and AACR2 provide rules for cataloging to ensure consistency
- FRBR and RDA aim to update cataloging standards for the digital era
This document discusses RDA, FRBR, and FRAD and how they connect cataloging principles and standards. It provides background on FRBR and FRAD and their conceptual models of bibliographic resources and relationships. It then explains how RDA is based on FRBR and FRAD principles and is designed for the digital environment. Key differences between RDA and AACR2 are outlined such as a broader scope, being principle-based rather than rule-based, and emphasizing user tasks. Implementation plans target the first quarter of 2013 for major libraries to transition to RDA.
This document provides an overview of RDA (Resource Description and Access), FRBR (Functional Requirements for Bibliographic Records), and FRAD (Functional Requirements for Authority Data). It discusses how RDA is based on FRBR and aims to improve resource discovery by focusing on user tasks and clarifying relationships between works, expressions, manifestations and items. Key differences between RDA and AACR2 include RDA being more principles-based, user-focused, and designed for the digital environment. Major libraries plan to implement RDA in early 2013.
Maja Žumer: Library catalogues of the future: realising the old vision with n...ÚISK FF UK
The document discusses the future of library catalogs and metadata, noting that catalogs need to change to meet new user needs and expectations by making data more intuitive to explore, exposing relationships between works and other entities, and fully utilizing the quality of library metadata. It also reviews the history and conceptual models for bibliographic data like FRBR, FRAD, and FRSAD, which aim to present bibliographic information in a more user-oriented way. Libraries will need new systems built on these conceptual models to improve user tasks like finding, identifying, selecting, and exploring materials.
The document provides an overview of cataloging and discusses key concepts in cataloging like:
- Original vs copy cataloging
- Elements included in bibliographic description like author, title, publisher
- Standardization provided by ISBD and AACR2 rules
- Transition to new models like FRBR and RDA that aim to improve user tasks like finding, identifying, and selecting materials
This document provides an overview of cataloging concepts and standards. It discusses what cataloging is, the different types of cataloging (original vs. copy cataloging), and the key elements included in catalog records like bibliographic description, subject analysis, and classification. The document also explains historical standards like ISBD and AACR2 and emerging models like FRBR, FRAD, and RDA which aim to improve resource discovery in the digital age. While original catalogers need detailed rules knowledge, most school librarians can get by with a general understanding of standards and knowing where to find detailed rules when needed.
The document discusses the concepts and standards involved in cataloging library materials, including:
- Bibliographic description, subject analysis, and classification are the main elements of cataloging.
- There are two types of cataloging: original cataloging which is done from scratch, and copy cataloging which adapts existing records.
- Cataloging standards include ISBD for bibliographic description order/punctuation, and AACR2 rules.
- FRBR is a conceptual model that aims to improve user tasks like finding, identifying, selecting, and obtaining materials. RDA and FRBR are the new standards replacing AACR2.
FRBR stands for Functional Requirements for Bibliographic Records.
Functional Requirements for Bibliographic Records is a conceptual entity-relationship model developed by the International Federation of Library Associations and Institutions (IFLA).
A conceptual entity relationship model that relates user tasks of retrieval and access in online library catalogs and bibliographic databases from a user’s perspective.
A new conceptual model for bibliographic universe with a strong users focus .
The purpose of this entity relationship analysis was to discover the logical nature of bibliographic data in terms of entity, attributes and relationship.
1) Knowing the basic elements of a bibliographic record like author, title, publisher.
2) Understanding the difference between copy and original cataloging.
3) Being able to locate and utilize existing catalog records from databases.
A working knowledge of cataloging best practices will allow you to effectively describe and organize your school library collection so students can find what they need. But you don't
This document discusses digital libraries and provides examples of metadata for describing a map from the Library of Congress American Memory collection using Dublin Core elements. It defines key aspects of digital libraries including content, users, and services. Metadata examples are given for elements like title, subject, description, and creator to catalog the historical map. The document demonstrates how Dublin Core can be used to provide structured descriptive information about digital objects.
This document discusses digital libraries and their components. It defines a digital library as a managed collection of digital objects that are accessible over a network. Digital libraries have streams of content like text, video and audio, as well as structures for organizing content and spaces for indexing and retrieving items. Services are provided to users through scenarios, while societies define the communities that digital libraries serve. The 5S model is presented as a way to conceptualize the different aspects of a digital library, including streams, structures, spaces, scenarios and societies.
The document discusses the Book of the Dead Project which aims to create digital editions of ancient Egyptian manuscripts using semantic web standards like CIDOC-CRM, FRBRoo and RDFa. It focuses on modeling the relationships in Malcolm Mosher's work on the Book of the Dead, capturing concepts like spells, translations, and depictions. The project uses the ResearchSpace platform to facilitate collaborative annotation and exploration of the manuscripts and related artifacts.
RDA is a new cataloging standard that aims to make resource description and access more intuitive for users. It is based on FRBR and FRAD models established by IFLA that define bibliographic entities and their attributes and relationships. RDA seeks to accommodate all types of resources, align with the semantic web, and simplify the cataloging process by focusing on recording attributes as they appear. It was implemented in 2013 and emphasizes direct transcription over abbreviations to create more user-friendly records. RDA aims to improve users' ability to find, identify, select, and acquire resources through catalog searches.
MARC 21 Training at Daffodil International UniversityNur Ahammad
MARC 21 is a standard format for cataloging library resources that allows bibliographic information to be shared among different library systems. It structures data into fields like author, title, subject headings. A MARC record contains a leader field, fixed fields with coded info, and variable fields for descriptive data. Practical experience with MARC 21 cataloging is needed to truly master it, as simply teaching the standard is not enough. The format was developed in the 1960s and continues to be updated to meet evolving library needs.
This document discusses standards related to archival description, including EAD, DACS, and MARC. It provides an overview of each standard and their purposes. EAD is an XML standard for encoding finding aids to display them online. DACS is a content standard that does not prescribe structure, leaving that to EAD. MARC was originally created for libraries but has been adapted for archival use through standards like APPM and ACM to represent archival materials and collections.
This document provides a summary of a presentation about the transition from AACR to RDA (Resource Description and Access).
1) AACR has served libraries well for decades but is no longer suitable for the digital world. RDA is being developed as the new cataloguing standard to address this issue and ensure catalog data is usable online.
2) RDA is based on FRBR (Functional Requirements for Bibliographic Records) and other conceptual models which define bibliographic entities, attributes, and relationships to improve user tasks like finding and identifying resources.
3) Early implementations of RDA show benefits like more organized displays that are easier for users to navigate compared to traditional catalogs without FRBR principles
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...Juliya Borie
The use of linked data within the library community has the potential to significantly impact cataloging and may help improve information discovery and retrieval for the end user. For librarians and users alike, serial publications have been a constant challenge due to their complex publication histories and fluid nature. In this webinar, the presenters will reprise their NASIG 2013 Conference presentation, providing an overview of Linked Data developments within the library and journal publishing communities. By exploring serials in relation to FRBR principles and linked data modeling techniques, the presenters will describe how a search for periodical literature might be improved in a linked data environment. Taking description out of the current record constraints, serials librarians will be able to express the relationships between multiple versions of the same publication, and document how a particular journal has changed over time. The linked data model also opens up many opportunities for the provision of value-added content to bibliographic descriptions.
The International Federation of Library Associations and Institutions (IFLA) is responsible for the development and maintenance of International Standard Bibliographic Description (ISBD), UNIMARC, and the "Functional Requirements" family for bibliographic records (FRBR), authority data (FRAD), and subject authority data (FRSAD). ISBD underpins the MARC family of formats used by libraries world-wide for many millions of catalog records, while FRBR is a relatively new model optimized for users and the digital environment. These metadata models, schemas, and content rules are now being expressed in the Resource Description Framework language for use in the Semantic Web.
This webinar provides a general update on the work being undertaken. It describes the development of an Application Profile for ISBD to specify the sequence, repeatability, and mandatory status of its elements. It discusses issues involved in deriving linked data from legacy catalogue records based on monolithic and multi-part schemas following ISBD and FRBR, such as the duplication which arises from copy cataloging and FRBRization. The webinar provides practical examples of deriving high-quality linked data from the vast numbers of records created by libraries, and demonstrates how a shift of focus from records to linked-data triples can provide more efficient and effective user-centered resource discovery services.
RDA (Resource Description and Access) is a new standard for describing library resources, designed to replace AACR2. Library staff, including public services, systems personnel, and catalogers, may have heard mention of RDA but not know much about it or how it will change their daily work. You may have many questions. What is RDA? We'll give a very little bit of history and theoretical background. What is this going to mean for catalogers, ILS managers, and users in the near term? What are the future implications, or, why are we doing this? What are the juicy bits of controversy in cataloger-land? And finally, Do we HAVE to? We'll talk for a while, have some activities that get you thinking, and find out your thoughts on RDA.
Presented at "Captains & Crew Collaborating," the 8th annual paraprofessional conference at J.Y. Joyner Library, East Carolina University.
Ähnlich wie Clusters from outer space: Primo Deduping and FRBRizing in Context and Reality (20)
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Clusters from outer space: Primo Deduping and FRBRizing in Context and Reality
1. Clusters from Outer Space
Primo Deduping and FRBRizing in Context and Reality
Laura Akerman, Nathalie Schulz, Amelia Rowe
With help from Lukas Koster
IGELU Annual Meeting September 12 2017 St. Petersburg, Russia
2. 1. Why do librarians bring things together?
It’s called “collocation”...
4. Functional Requirements for Bibliographic Records, 1991
The study uses an entity analysis
technique that begins by isolating the
entities that are the key objects of interest
to users of bibliographic records. The
study then identifies the characteristics or
attributes associated with each entity and
the relationships between entities that
are most important to users in
formulating bibliographic searches,
interpreting responses to those searches,
and “navigating” the universe of entities
described in bibliographic records.
IFLA Study Group on the Functional Requirements
for Bibliographic Records. Functional
Requirements for Bibliographic Records. K . G.
Saur München 1998
5. It’s all about what users do
● using the data to find materials that correspond to the user’s stated search criteria (e.g., in the context of a search
for all documents on a given subject, or a search for a recording issued under a particular title);
● using the data retrieved to identify an entity (e.g., to confirm that the document described in a record corresponds to
the document sought by the user, or to distinguish between two texts or recordings that have the same title);
● using the data to select an entity that is appropriate to the user’s needs (e.g., to select a text in a language the user
understands, or to choose a version of a computer program that is compatible with the hardware and operating
system available to the user);
● using the data in order to acquire or obtain access to the entity described (e.g., to place a purchase order for a
publication, to submit a request for the loan of a copy of a book in a library’s collection, or to access online an
electronic document stored on a remote computer).
IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional Requirements for Bibliographic Records. K . G. Saur
München 1998
6. FRBR Work
work: a distinct intellectual or artistic creation.*
● An abstract entity - no one material item to point to
● Recognized in realizations or expressions;
● Work is the commonality of content between and among various expressions (example: Homer’s
Illiad)
● Sometimes difficult to define boundaries; differences may be cultural.
IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional Requirements for Bibliographic Records. K . G. Saur
München 1998
7. FRBR Expression
expression: the intellectual or artistic realization of a work in the form of alpha-
numeric, musical, or choreographic notation, sound, image, object,
movement, etc., or any combination of such forms
● Any change in intellectual or artistic content constitutes a new expression
● Change in form (e.g. from alphanumeric to spoken word) - new expression
● Changes in physical form (e.g. typeface) are not new expression
● Examples of new expressions - translation
● My own “layman’s” term would be “version”
8. 2. How do librarians bring things together?
Technology made this change..
9. Card Catalog - linear arrangement
● Various ways of
organizing cards but
principle of bringing
together the various
versions of a work.
● “Deduping” could be
adding call numbers for
print and microform to
same card.
A. L. A. rules for filing catalog cards
1942. "Second printing, with
corrections,April, 1943”
https://catalog.hathitrust.org/Record
/002433836
10. Here you see in (b)
alternative rule,
something like the
origin of “uniform title”
concept - organizing all
translations via a
heading for the original
title and language.
11. California Digital Library “dedup” algorithm
“DLA merges book format records through a complex algorithm that assigns numeric "weights" for
matches on different parts of the bibliographic record. When the total of these weights reaches a certain
level, the records are considered to be sufficiently alike to warrant bringing them together as a single
database record. If the total weight does not reach this level, the records are not merged.
Not all data elements have to match exactly for the records to be merged. The use of weighting means
that some variation between the records can be tolerated, as long as the overall score is high enough to
be considered a match.”
Coyle, Karen. Technical Report No. 6 RULES FOR MERGING MELVYL(R) RECORDS* Revised June 1992 (copy
provided privately).
See also Coyle, Karen, and Linda Gallaher-Brown. "Record matching: an expert algorithm." ASIS'85: Proceedings of the
American Society for Information Science (ASIS) 48th Annual Meeting. Vol. 22. 1985.
12. Other approaches
● VTLS Cataloging system based on FRBR entities
https://www.slideshare.net/VisionaryTechnology/vtls-8-years-experience-with-
frbr-rda-4755109
● WorldCat Work Descriptions: http://www.oclc.org/developer/develop/linked-
data/worldcat-entities/worldcat-work-entity.en.html
14. Primo Dedup ...
● Derived from California Digital Library algorithm.
● Roughly equivalent to FRBR “Expression” level - edition of a book, director’s
cut of a movie, recording of a symphony by a particular orchestra on a certain
date
● Should bring together issuances of same content in different formats - print,
electronic, microform, etc. (manifestations)
15. Primo Dedup merged record
● Provides a merged record PNX - selecting one description out of the “dups”, then adding from all the
records:
○ local fields,
○ holdings/items from all the records.
● Primo’s selection of “preferred record” is based on the “delivery category” assigned by the Primo
norm rules. Current hierarchy is:
○ SFX resource
○ Electronic resource
○ Metalib resource
○ Physical item
16. Dedup - matching up “dups”
● Assign a “score” based on full or partial matching of selected fields, as indicated in the
“dedup” section of the PNX (created by normalization rules)
● Same field, different rules for serials, for articles, and for everything else
● If score meets target number, it’s a match.
● The Primo ingest pipe calculates match scores for every incoming record and assigns a
match ID associated with matching records. It also removes deleted records from a
match ID cluster, and adds or removes records to a match ID if their score changes.
● If changes are made to the dedup normalization rules, the records would need to be
updated (renormalization pipe or reload from source) to change.
● “Force dedup” setting on a renormalization pipe might be needed if you tinkered with
21. 5. Dedup at Emory Libraries (Laura)
● When we first implemented Primo in 2008-9, we experimented with FRBR but decided it was too
confusing for users. But we wanted dedup.
● Intent of dedup was to bring print, microform, electronic etc. versions of the same content together.
● Our big concern at implementation was, we were creating very brief records for electronic serials
from SFX, and they became the “merge record” and our lovely print CONSER full serial records
disappeared. Our solution at that time was to add 856 URLs to the serial records, making the print
record “electronic” to Primo’s norm rules, which put it on equal footing in the choice of merge record.
This was too much manual work.
● With Alma, things are better for e-serials; Community Zone e-journal records are fuller, so we can
choose fuller records for e-serials in Alma.
● From time to time when we have had dedup problems, Ex Libris support staff have suggested we
just use FRBR instead, but we have re-evaluated it and decided “no”.
22. The algorithm isn’t friendly to rare book cataloging.
The first edition and some of the
rare editions of this book were
deduping together.
Why? Dates...
23. Solution? Exclude entire library or location where
the rare stuff lives
(Screenshot of norm rules)
25. 245 10 |a Libellus |h [microform] / |c F.
Barholomei de Vsingn Agustiniani de falsis prophetis tam
in persona quã doctrina vitandis a fidelibus. De recta et
mũda predicatiõe euãgelij & quibus conformiter illud
debeat predicari. ...
264 _1 |a Erphurdie [i.e. Erfurt] : |b
[Matthes Maler], |c 1525.
300 __ |a 79 pages (4to) ; |c cm.
336 __ |a text |b txt |2 rdacontent
337 __ |a microform |b h |2
rdamedia
338 __ |a microfiche |b he |2
rdacarrier
500 __ |a Signatures: A-K4.
500 __ |a Title within ornamental
border.
510 4_ |a Panzer (Annales
typographici) |c VI: 503, 63
510 4_ |a Kuczyński |c 2681
245 10 |a Libellus |h [microform] / |c F.
Bartholomei de Vsingen Augustiniani de Merito
bonorum operum. In quo veris argumentis respondet
ad instructionem fratris Mechlerij Franciscani de
bonis operibus. quam inscribit christianã. ...
264 _1 |a Erphurdie [i.e.
Erfurt] : |b [Mathes Maler], |c 1525.
300 __ |a 70 pages (4to) ; |c
cm.
336 __ |a text |b txt |2
rdacontent
337 __ |a microform |b h |2
rdamedia
338 __ |a microfiche |b he |2
rdacarrier
500 __ |a Signatures: A-I4.
500 __ |a Title within
ornamental border.
510 4_ |a Panzer (annales
typographici) |c VI: 503, 62
26. Other side effects -
Our digitized books from the Rose Library Special collections (not in Alma) no
longer dedup with the source physical book records from Alma - even though we
retained the record ID in the digital metadata.
30. Why?
No identifiers in separate records that could break the dedup
245 (Title) subfield p (part) or v (volume) for volume number doesn’t have enough
weight to lower the score enough.
31. Solution - not nice
Add the MMSID for
each Alma record
for the 12 volumes
to the Dedup rule
so it will get a t99
“do not dedup”
value.
32. Same title same year (work in progress?...)
Both published 1999. Same composer and work (Chopin, Piano
Concertos nos. 1 & 2)
Artists (Arthur Rubinstein, Martha Argerich) are not part of dedup
algorithm!
● Ideas: Add mapping of 024 or 028 or 037 - (publisher numbers, repeatable, not consistently formatted, not
“universal”) as Universal ID (F1)
● Support suggested: Add record ID to F1 (Universal ID) as the last “or” choice, to subtract points/prevent dedups
The American movie directed by Steven Segal and the Chinese
language movie directed by Corey Yuen with the same title
were issued in 2009. I couldn’t find a thumbnail of our copy of
the Yuen movie which is a Videodisc.
33. 6. More Dedup problems at RMIT University
(Amelia)
Genki
● Two records with a single number different in the titles
● Number displayed in roman numerals I and II
● Primo was deduping the records and only displaying title metadata related to
Genki II
● Users couldn’t find Genki I
34. Screenshot of the DeDup
test in Primo BO.
This is how we identified
the title field was
matching.
35. Solution = Changed roman numerals in title (245 $a) to numerical representation
For example: 246 $a Genki 2
37. 7. Primo “FRBR clustering” (Nathalie)
● Simpler algorithm
● Uses author-title (or title only) keys to create clusters of records for a work.
● In the FRBRization part of a pipe, if a match is found based on the keys the
record is added to the same FRBR group.
38. FRBR matching
FRBR vector (simplified explanation)
K1 - Author part key (Fields 100 or 110 or 111 OR 700, 710, 711)
K2 - Title only key (Field 130)
K3 - Title part key (Not Serials: 240 and 245; Serials: 240 or if does not exist 245)
● Not all subfields are used.
● Normalization to remove punctuation, change to lowercase, etc.
● K1 and K3 are combined for matching, K2 is not.
39. FRBR problems (Nathalie, Bodleian Libraries)
● Records that you want to cluster, that don’t
● Records that cluster, that you don’t want to
● Sort order within clusters
(Examples are from http://solo.bodleian.ox.ac.uk - which has FRBR turned on, but
not dedup)
41. FRBR problems (Nathalie)
FRBR section of the PNX records
Print record
<k3>$$Kjournal of women politics and policy$$AT</k3>
Key used for matching: none
Electronic records
<k2>$$Kjournal of women politics & policy online$$ATO</k2>
<k3>$$Kjournal of women politics and policy$$AT</k3>
Key used for matching: journal of women politics & policy online
43. FRBR problems (Nathalie)
FRBR section of the PNX records
9th and 10th editions:
<k1>$$Kroberts harry$$AA</k1>
<k3>$$Kriley on business interruption insurance$$AT</k3>
Key used for matching: riley on business interruption insurance~roberts harry
7th and 8th editions:
<k1>$$Kcloughton david$$AA</k1>
<k1>$$Kriley denis$$AA</k1>
<k3>$$Kriley on business interruption insurance$$AT</k3>
Keys used for matching: riley on business interruption insurance~cloughton david
riley on business interruption insurance~riley denis
45. FRBR problems (Nathalie)
FRBR section of the PNX records
Print record - incorrect metadata! (24514 $aThree sisters)
<k1>$$Kcaldwell lucy 1981$$AA</k1>
<k3>$$Ke sisters$$AT</k3>
Electronic Record
<k1>$$Kcaldwell lucy 1981$$AA</k1>
<k3>$$Kthree sisters$$AT</k3>
46. FRBR problems (Nathalie)
● Records that cluster, that you don’t want to
○ This is subjective!
○ The normalization rules can be used to exclude records from clustering by assigning
“<t>99</t>”
● Oxford case-study
○ Excluded from clustering: printed maps, printed music, sound recordings, video recordings,
computer software, and printed books prior to 1830.
○ Individual records can also be excluded by adding a local field to the Aleph record (which is
used by the normalization rules).
47. FRBR problems (Nathalie)
● Sort order within clusters
○ Set in the Back office.
● Oxford case-study
○ At Oxford we have chosen relevance as that works best for people doing known item searches
as the result they want will usually be the first record in the cluster.
○ However, Date-newest would be preferable in some situations (e.g. multiple editions of a text
book)
○ Sometimes the most “relevant” record is not what you would expect ….
50. 8. FRBR problems (Amelia)
FRBR not occurring unexpectedly - such as minor differences in cataloging
51. Solution (to be implemented)
Add transformations to Normalization rules - FRBR Section
(thank-you Nathalie for the solution to this problem)
52. More FRBR problems (Amelia)
Tecnica dei modelli
● Fashion series split into 3 volumes
● Each volume has it’s own Alma record
● Primo was clustering the records and only displaying the $n information for
volume 3 in the search results
● Users couldn’t find volumes 1 and 2
53. Solution:
Add t=99 for records
with the series title 240
$a Tecnica dei modelli
Preventing FRBR (Amelia)
54. Other FRBR problems (Amelia)
● User understanding
○ How much do users understand about clustering?
○ How much do they need to know?
● Staff training requirements
○ How much do staff understand about clustering?
○ How much do they need to know?
■ Enough to help the users
55. Above: Screenshot of deduped item in Classic UI
Below: Screenshot of deduped item in New UI
DeDup : Classic Primo and New Primo
56. FRBR : Classic Primo and New Primo
Above: Screenshot of clustered item in Classic UI
Below: Screenshot of clustered item from New UI
57. Summary of issues with Primo
● 245 $n and $p not given enough weight
● Inability to DeDup or Cluster across all collections (example: Alma and PCI)
● Matching depends on textual strings in the metadata - this can have errors or
legitimate variations
● Deduping should not happen for rare book cataloging
● Lack of control on choice of the “merged record” for Deduping
● Lack of reliable identifiers in records especially for media….
● Lack of control...
58. The Future...
● New field approved to be added to MARC for work identifiers (URIs): 758
● Linked Data! If you define an Entity… it must have an Identifier (URI: URL
or URN).
● RDA/FRBR “Work” vs BIBFRAME “Work” (RDA Expression?)
● Not clear where the overlaps or agreements are in version 2.0
● BIBFRAME still being refined
59. Questions:
How might we address problems with deduping and FRBR clustering?
Should the algorithms be modified?
Should Work and Expression identifiers be generated on-the-fly in Alma and
Primo, or be generated once, be stored and be editable?
Is Primo Dedup merged display best for users? What other approaches might
work better?
60. Contacts:
Laura Akerman, Discovery Systems and Metadata Librarian, Emory University
liblna@emory.edu
Nathalie Schulz, Systems Analyst, Bodleian Libraries, University of Oxford
Nathalie.Schulz@bodleian.ox.ac.uk
Amelia Rowe, Applications Librarian, RMIT University
amelia.rowe2@rmit.edu.au
61. Credits:
● Opening image: NASA, Hubble Space Telescope image, Gas Clouds and Star Clusters, NGC 1850.jpg
● Image from Cutter, Charles A.,1837-1903, Rules for a printed dictionary catalogue. Washington: Government
Printing Office, 1875, retrieved from Hathi Trust, https://catalog.hathitrust.org/Record/009394960
● Frank Sinatra and Martha Argerich album cover and Above the Law (Segal) DVD cover thumbnails from
Amazon.com
● Artur Rubinstein album cover thumbnail from Discogs.com
● Above the Law (Yuen) DVD thumbnail from Internet Movie Database
Hinweis der Redaktion
(Laura) The origin of this talk is, I started receiving a spate of dedup issues reported by other librarians at Emory University and thought it’d be an interesting topic. But I wanted to take a higher level view of the process and wanted to include FRBR which we don’t have experience of. So I put out a call to collaborate on the Primo list and was delighted to find great collaborators in Nathalie Schulz from the Bodleian Library, Oxford University, and Amelia Rowe from RMIT University in Melbourne, Australia.
Before we get into the juicy problems, I will start off with a little background - bear with it… Librarians have been bringing descriptions together for a very long time to assist users to find what they want.
Bringing all versions of a work together before online catalogs involved arranging cards for different version to be found together in the card catalog, as well as on the shelf due to in the classification numbers assigned to books. Works were generally to be found/identified in the card catalog by author and title, but if there was any ambiguity, a uniform title could be constructed which would be unique in combination with the author name. Certain special authors might have more elaborate arrangment into sections by language, special sections for complete and selected works, compilations by form (e.g. “Poetical works”). This led to different sorts of uniform titles
In 1991 the International Federation of Library Associations published Functional Requirements for Bibliographic Records. This was a result of intense committee work to develop metadata requirements for libraries at the national level. The group analyzed both user tasks that needed support, and a definition of conceptual entities that lay behind those tasks and their relationships. This was really a new conceptualization of description for discovery of information resources.
Some of the things we think users do include: determining if an electronic version of the same text and edition that exists in print is available through the library - or vice versa (some users prefer print); determining if a particular described sound recording contains a particular song; finding a specific rare printing of a book described in a specialized bibliography
So in this presentation it is good to look at the FRBR Work and Expression entities and the differences between them. Work is abstract and something that could have grey areas - I am thinking of some manuscripts and serially published things... It’s a mental idea of all the versions of what we think of as being “the same work”.
This is a bit of ancient history for most libraries today, but in a dictionary catalog, alphabetical arrangement of titles (within an author section of the catalog if there was an author main entry) would bring together different editions of books.
Merging and deduping of records for “same content” was needed when large scale union catalogs, incorporating records from many sources, became possible with library automation and online catalogs. In an email to me, Karen Coyle, one of the authors of the algorithm, pointed out that author names were not as reliable when this was developed, due to a mix of old and new cataloging rules (AACR1 and AACR2), and that times have changed. This algorithm was the source for the dedup algorithm in Primo
I’ve not had the time to do deep research into other system models but wanted to note these. VTLS is now owned by Innovative Interfaces; their literature speaks of users being able to search once and retrieve all related versions of a work including those with variant titles and different languages. OCLC has been developing algorithms to identify “work entities” and associate them with clusters of records; these identifiers are available under an Open Data License and can be found in the linked data section of
So this is not a full tutorial but I’m just going to hit some highlights about how dedup works in Primo. The approach is based on the California Digital Library’s approach to merging “duplicate records” from its database. Only instead of weeding out duplicates (which you should do in your ILS system before things get to Primo!), the idea is to combine descriptions of essentially the same content that may have different formats - e.g. electronic, microform, print; CD vs. streaming audio; etc.
Here’s where the tricky part comes in. There really isn’t a science behind assigning a score to each combination of two records and calling it a match if it meets certain threshold. That was developed through trial and error by California Digital Library.
This is just a small part of the algorithm that, if you have server access to Primo, you could find and edit. You can see that it scores the point scores for various kinds of matches in data elements between two records. I think we have tried this a couple of times but have not had much success. If others have, would be interested to hear your experiences. This file is not protected from being overwritten by updates, as far as I know.
Here you have your average normalization rule for dedup. Field “F5” contains a title. The condition says this rule is only for “non-serials”. It takes the title 245 field, including title, subtitle, number and part. Then it heavily normalizes the string to remove initial articles and various punctuation - brackets, ampersands, etc. There’s another rule for F5 for serials - it operates on the 022 field subfield z (invalid ISSN) These rules can be modified - at your own risk!
The DEDUP Test utility is a wonderful tool for understanding the complex process by which the dedup stage of record loading determines whether two records match and assigns them a dedupmrg number and match ID - or not. This happens in two stages. Title, date and record identifier get checked and basically, if the record IDs don’t match but the title and date do, it goes to full comparison. This is where “points” are assigned for full or partial matching, or subtracted for non-matching. Notice that the short title matches here and that gets 450 points
Notice that the long title doesn’t match completely, but enough words match so that it still gets 400 points. More about this later...
I regret I don’t have a screenshot of the deduped record here, but imagine all of these records deduping together. The date matching actually allows a couple of years variance. 25 points are subtracted for lack of exact match within 2 years, not enough to prevent the deduping in some cases.
I tried various less drastic changes, but they weren’t working. Our Special Collections library (now called The Stuart A. Rose Library for Manuscripts and Rare Books) was impatient - a class was going to study this autobiography and its editions, and they couldn’t abide Primo clumping them together (and a lot of other rare editions as well). This is just one example of many. So we ultimately added a rule to give all records with an item in that library a “do not dedup” value of t 99. Later, we did the same for the special collections location in the Theology library.
But rare books in microform or electronic collections are still clumping...
We want our rare books that we have digitized to dedup with the digital version, but they no longer do so - this is a tradeoff we’ve not found a solution for.
Twelve fabulous recordings of “ol’ blue eyes’ in our collection. Here’s a somewhat fuzzy image of the cover of Vol 7, noting the songs “Night and Day”, “But Beautiful”, “The Song is You”, and “What’ll I Do?”
When I searched for Frank Sinatra the Columbia Years in our Primo, I got two results - the record for the set, and a record for “Vol. 10 the Complete Recordings”. What happened to volume 7? If we look at the item details, we can see what happened. Primo deduped all the volumes. So only the contents note of vol. 10 displays.
Viewing the PNX in the PNX viewer and clicking on “Match ID” confirms this
The first problem, I neutralized in Production by adding “t 99” to these record IDs. Someone reported the second one - the videos - while I was at this conference. There are publisher identifier numbers in tags 024 or 028 or 037, that could be added but both fields are repeatable, there may be inconsistencies in the formatting, and these numbers aren’t universal. I’m nervous about using them. The suggestion to add the Alma record ID to F1, the Universal ID field to which is mapped the 010 (Lc card number) might result in breaking this dedup, but how many others that we do want to happen would it break? We will test these approaches but are not optimistic.Now I’m going to turn this over to Amelia Rowe who’ll tell you about more fascinating dedup problems at RMIT University.
Items not deduping between collections/pipes is the greatest cause for confusion for our users.
We can teach them about DeDup and clustering but if they see a behaviour that doesn’t match what they’ve been taught they get confused and think something is wrong.
Pipe related notes:
Because Dedup/FRBR is done at the pipe level there is always content that isn’t dedup/frbr-izing as the end user would expect.
At RMIT we ingest resources from a variety of locations (with 10 active pipes)
Some resources may be available between multiple pipes, and/or in PCI.
When some records don’t dedup but others do this causes confusion especially for staff.
Screenshot = record that is the same in our Research repository and our research bank, yet they don’t dedup because they are in different pipe.
In this instance the TN_rmit_res33432 record is an RMIT originating record that has found it’s way into the PCI
Note: users don’t see the record ID I have used a bookmarklet to display this for my own purposes.
There is no fix/solution for this
Records are assigned to FRBR clusters in Primo as part of the pipe. Keys based on the author(s)/titles are compared with other records and if a match is found the record is then added to the same FRBR group. A record can only be part of one FRBR group.
There are three different types of keys in the FRBR Vector:
Author part - uses the “main entry” (1XX fields) and if this is not present the added entry author fields (7XX)
Title only key - uniform title from field 130
Title part key - uniform title from field 240 and title from 245 (except for serials which have a 240). There are other fields included in the normalization rules for when there is no 240 or 245 field but as records are rejected from Primo if there is no 245$a, these will rarely be used.
Not all subfields are used, e.g. subfield $l (Language) in the 240 field is not used, which allows the original and translations of a work to cluster together.
The Author part keys and Title part keys and combined to make strings for matching, while the Title only key is not
There is a detailed explanation on the Ex Libris Customer Centre at: https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/Technical_Guide/040FRBRization/010The_FRBR_Vector
Sometimes records simply do not have enough information to create matching keys. In this case the print record does not have any author information so there is only a title key. The two online titles that cluster have the same uniform title.
The print record only has a title part key, and this is not used for FRBR matching on its own.
The electronic records have a K2 fields which are used on their own for matching.
Note: the keys used for matching are stored in the p_frbr_keys table.
If the author changes between editions (which often happens with legal works), the keys won’t match and so they do not cluster.
The 9th and 10th editions have a 100 field for “Roberts, Harry”.
The 7th and 8th editions have 700 fields for “Cloughton, David” and “Riley, Denis”.
In this example the reason for the records not clustering is not immediately apparent.
Primo can only work with the metadata as found in the records - if this is incorrect, as in this case (non-filing indicator), the records will not cluster.
The University of Oxford has had Primo since 2008, but until mid-2011 when we moved to Aleph there was also a separate OPAC. Moving to “Primo only” meant staff in the libraries started to look more closely at the clustering. As part of a review we trialled (in a test version) turning off clustering. After staff testing and usability testing the decision was made to go with partial clustering and there have not been any calls to change this.
The normalization rules to exclude clustering make use of fields from the Aleph records, both standard FMT fields and local RTP (Record Type) fields which is how we identify most of the pre-1830 books. We also have a local “SOL” field that we can use to exclude individual records from clustering.
When we were reviewing clustering at Oxford, we had the default sort order set to “Date newest”. Some of the complaints that people had about clustering were because the specific record they wanted could be hard to find within a cluster. Changing to sorting by “relevance” helped with this. However, there are times when the “relevant” record is unexpected.
In this example, Primo is considering the Spanish translation to be the most relevant and is presenting it as the “top” record in the cluster
See the screenshot for an example of two records not deduping (record 1 and record 3) because of the ampersand in the title
Record 1 is the 5th and 3rd editions FRBRized
Record 3 is the 2nd and 4th editions FRBRized
Users would expect all of these record to be found under the one record
In this example FRBR is correct however the confusion for users meant we had to change the system's behaviour for users
Note: Setting t=99 is a solution we typically try to avoid as it risks creating very complicated frbr:t normalization rules
Typically we try to edit the cataloguing to prevent dedup
While those of us who work closely with Primo and have a concept of FRBR and what it is and how it works in Primo the majority of library staff and our users do not have this understanding. This makes the display in Primo confusing for users.
Have you tried explaining FRBR to your staff or users? I’ve explained it many times to staff and still there is not a clear understanding of what it is within Primo.
To help overcome this confusion staff at RMIT are working on an online IST (in-service training) module specifically related to DeDup and FRBR to help educate our staff who can in turn help the users.
Sharing how dedup and Clustering are presented in New vs Classic Primo UI
In some instances the New UI is more user friendly in the way it presents the records
Ex. deduped print and electronic material appear in the same full record instead of separate tabs
In classic view versions was easily lost in the top right hand corner of the record.
Now it is part of the records availability
Note: we are thinking of changing the terminology from “versions” to “editions and formats” in the hope of using terminology that better explains the functionality to our end users.