This document discusses the relationship between VIAF (Virtual International Authority File) and ISNI (International Standard Name Identifier). It provides an overview of ISNI's growth and member sources. VIAF and ISNI work together to synchronize data while maintaining separate scopes. The ISNI Quality Team works to correct errors by merging duplicate records, splitting mixed identities, and notifying data sources. Recommendations are made to improve interoperability between VIAF and ISNI.
VIAF and ISNI Synchronization for Maintaining Accurate Identity Clusters
1. VIAF Global Council - Lyon, France 15 August 2014
VIAF and ISNI
Synchronisation
Janifer Gatenby
EMEA Program Manager Metadata
2. cross-domain bridging-domains
Libraries
Text Rights
Trade Sources Music Rights
Encyclopaedias
Researchers & Professional
Granting organisations
Professional Societies
Article databases
Theses databases
Archives and
Museums
3. ISNI Status at July 2014
• 8.01 million assigned ISNIs (was 1 million 2 years ago)
• 15.4 million links; ISNI as linked data
• ORCID Registration process is accessing ISNI
• New members: Harvard University, La Trobe University and COPYRUS (Russia)
• Linked Content Coalition names ISNI as # 1 strategy
Databases Assigned Links
Research 12 836,142 1,845,165
Text rights 7 129,816 692,580
Music 5 315,918 450,717
Libraries & trade 4 6.8 million 12,356,010
Organisations 3 446, 237 109,204
4. Current ISNI Sources 30…and growing
GENERAL SOURCES
Bowker Books in Print BOWKER
The European Library (48 national
libraries)
TEL
Virtual International Authority File (33
libraries)
VIAF
RIGHTS MANAGEMENT
Access Copyright, Canada ACCE
Authors’ Licensing and Collecting
Society, UK
ALCS
Centrum Dienstverlening Auteurs- en
aanverwante Rechten, Netherlands
CEDA
Centro Español de Derechos
Reprográficos
CEDR
Irish Copyright Licensing Agency ICLA
Prolitteris, Switzerland PROL
VG WORT, Germany VGWO
MUSIC
American Musicological Society AMS
British Library Sound Archive BLSA
International Performers’ Database
Association
IPDA
MusicBrainz MUBZ
RESEARCHERS AND PROFESSIONALS
American Musicological Society AMS
Authors Guild AGLD
British Library Theses BRTH
Digital Author identifier, Netherlands DAI
Jisc Names Project, UK JNAM
La Trobe University AU:VLU
Modern Languages Association MLA
OCLC Theses OCLCT
ORCID and DataCite Interoperability
Network
ODIN
AuthorClaim and RePec OPENL
Proquest Theses PROQ
Scholar Universe, Proquest SCHU
Electronic tables of content ZETO
ORGANISATIONS
American Chemical Society ACS
Boekenbank, Belgium BOEK
Bowker Publishers BOWP
Publishers Licensing Society, UK PLS
Ringgold RING
5. VIAF and ISNI are Complementary
VIAF Scope
• Persons
• Organisations
• Works / uniform titles
• Expressions
• Meetings
• Geographic
• All public data
ISNI Scope
• Persons
– + musicians, researchers
• Organisations
• (excluding sparse)
• (excluding
undifferentiated)
• Includes private data
6. VIAF and ISNI are Complementary
VIAF Role
• Ingest authority records
from the world’s major
national and research
libraries
• Make clusters
• Expose and diffuse
ISNI Role
• Create permanent IDs
– By batch
– On demand
• Diffuse those IDs
– Libraries, trade, rights
management,
professional societies,
educational institutions
7. VIAF and ISNI are Complementary
VIAF System
• Harvester
• Clustering mechanism (re-clustered
monthly)
• 5 web interface languages
• Download in multiple
formats
• Linked data & SRU
1 million personal visitors
p.a.
ISNI System
• Batch load
• Online request API
• Web site (English only)
– Allows end user input
– Member input and correction
– 16+ indexes
• SRU; linked data
• Quality Team monitoring &
correcting
• Diffusion, including
corrections
8. 2012
Synchronisation ISNI to VIAF
• ISNI / VIAF
identifiers
2013
• Full
records;
ISNI a VIAF
source
2014
• ISNI
records,
verification
mark
9. VIAF ingest into ISNI
• VIAF provides full file each month
• ISNI compares previous & current files &
creates separate files for processing
– Deletes (VIAF cluster ID in old but not new)
• If assigned or has other sources, source becomes ISNI
– Contents changed
– Sources added or deleted
– New (VIAF cluster ID in new but not old)
– Re-matches VIAF deletes
• VIAF cluster movement reports for BL and BnF
12. End User Note
Dear Sir / Madam, The ISNI 0000000117488848 refers to "Marco Antonio
Casanova", Professor at the Catholic University of Rio de Janeiro. I am not
the author of "Fragmentos póstumos. - Nietzsche uma introdução
filosófica" or "Segunda consideração intempestiva da utilidade e
desvantagem da história para a vida". The author of these works is "Marco
Antonio dos Santos Casa Nova". You may confirm this information by
consulting our CVs at the Brazilian Research Council: Marco Antonio
Casanova
(me): http://lattes.cnpq.br/0400232298849115 Marco Antonio dos Santos
Casa Nova
(the other author): http://lattes.cnpq.br/3409704326617178
13. I
Correction – Source Error
• Reply to End User
Thank you for using the ISNI database and suggesting
improvements to your record. There is now another ISNI record
for Marco Antonio dos Santos Casa Nova (ISNI 0000 0004 3077
6045).
I have corrected your record, removed the erroneous titles and
added a link to your online CV (Lattes database).
If you have any further queries, please let me know.
• Email to Source
I am part of the the ISNI Quality Team (experts from the British
Library and Bibliothèque nationale de France in charge of the
quality of the ISNI database). We perform manual checking and
corrections in the ISNI database such as splits,
merges/deduplications and data corrections. ISNI Quality team
received a request from an enduser about ISNI records 0000
0001 1748 8848 and 0000 0004 3077 6045, VIAF 19998588 and
their related
Authority record XXX 109895029 mixes 2 identities (see the
snapshot below) :
1/ Marco Antonio Casanova (ISNI 0000 0001 1748 8848)
2/ Nova, Marco Antonio dos Santos Casa (ISNI 0000 0004
3077 6045)
Philosoph, and author of "Segunda consideração
intempestiva da utilidade e desvantagem da história para
a vida"
I hope this information will be useful.
=
Source 1 Source ISNI Source ISNI
14. Correction – Cluster Error
Source ISNI Source ISNI
• ISNI marks its two records as verified & sends to VIAF
• These records are given the same status as XA
records in VIAF clustering.
• No two XA records may occur in the same cluster
15. End User Note
• It seems 2 ISNIs has been assigned to the French
singer Laïka Fatien (born 1968 in Paris): ISNI 0000
0000 8065 8419 and ISNI 0000 0000 7238 637X. I
think the last one can be deleted.
16. Correction – Merged duplicate
• Reply to End User
• Thank you for using the ISNI database and providing us with
information about the duplicate records for Laïka Fatien.
•
• There is now just one record on the ISNI database for this
identity – ISNI: 0000 0000 8065 8419.
•
• If you have any further queries, please let me know.
• Notification to VIAF via
ISNI record
• ISNI record contains verification note
(i.e. treat as XA)
• ISNI record contains 2 VIAF cluster
identifiers
VIAF A VIAF B
=
ISNI
VIAF A
VIAF B
17. ISNI Quality Team
• Samples data regularly
– c. 2% VIAF clusters have mixed identities
– Duplicate clusters are higher, nearer 5%
• Makes corrections at cluster level
– Merges, splits, error notifications
– Access to cataloguing client / macros
• Makes system recommendations
• Gives approval for single source assignment
• Responds to End User input
• Sends emails to sources for error correction (12 VIAF sources
currently participating)
18. ISNI System Notification (Push process)
Someone
else has
matched &
details
You probably
need to take
action
19. ISNI Assignment Agency
• Matching, merging and splitting infrastructure
• Correction of errors
• Sampling and anomaly checks,
• e.g. date anomalies, unlikely mixture of sources
• Pseudonym splitting
• Re-importing and re-matching
• Diagnostic indexes and reports
• Enrichment
– e.g. Wikipedia, Dewey
• Notification system
20. VIAF ISNI Interoperability Task Force
• Met in Paris 22-23 April 2014
• Representatives from
– Bibliothèque nationale de France
– Biblioteca Nacional de España
– British Library
– Deutsche Nationalbibliothek
– Sudoc
– OCLC (VIAF system)
– OCLC Leiden (ISNI Assignment Agency)
21. Recommendations to VIAF at OCLC
• Use profession and other disambiguating data
• Investigate making an anomaly report
• Investigate changing the clustering rules to flag and prevent a record with a mixed
identity from entering the clusters where 2 or more sources have established
separate identity
• Investigate changing the clustering rules to prevent duplicate clusters.
• Provide deprecated VIAF Ids in the distributed data
• Treat records from ISNI that are flagged as manual as XA records
• Include ISNI in RDF
• Remove test from ISNI icon
• Only show one name form for ISNI in the wheel
• Investigate why SUDOC titles are not appearing
22. Recommendations to ISNI at OCLC
• Flag manual merges and splits (joint specification to be made)
• Indicate to VIAF that a VIAF source needs to be split from a VIAF cluster
(joint specification to be made)
• Keep up to date with VIAF
• Produce anomaly reports
• Produce notifications to VIAF sources
• [Provide only one ISNI record per VIAF cluster ID; make split off records
ISNI source]
• [Provide records with ISNI source to VIAF]
23. Recommendations to VIAF Council
• Mark undifferentiated authorities or consider not supplying them to VIAF
• Include nationality, particularly for own national identities
• Use VIAF in authority control and select VIAF cluster ID
– Also use ISNI
• If a mixed identity is found in VIAF or ISNI, use either the public interface
or [preferably] the member interface of ISNI to request resolution by the
ISNI Quality Team. All manual corrections made in ISNI will come to VIAF
as records with XA status to ensure merges or splits.
24. VIAF Global Council - Lyon, France 15 August 2014
Become Involved
Jointly let’s maintain clusters
25. The ISNI Quality Team
• Board members are British Library and
Bibliothèque nationale de France (Representing CENL)
• Seeking Associate Members
– KB, Netherlands in process
– Control own identities
– Access to client maintenance software
– Access to restricted data
– Provide back-up for end user responses
26. ISNI Members
• View whole database (but not restricted fields)
• Access to compare screen; can merge
• Reports on request
– ISNIs – simple report or enhanced
– Cluster movement report
– Diagnostic reports
• Statistics and links
29. Member view – list of additional
data displayed (if not private)
• Related identities
• Related persons
• Related organisations
• Nationality
• Gender
• Keyword or key phrase
• Dewey classification
• Publisher
• Dates active
• Associated countries
• Provisional records
• Including links to possible matches, if applicable
30. Private data
• Dates
• Personal Affiliations
• Titles of works
These can be masked from the public and
from member view. However most
sources allow titles to be seen by other
members to facilitate merging.
31. Do not merge
Anything that looks suspicious :
Report it in a general note and the QT will review
This title
belongs to
This is not the
same person
Multiple domains. ISNI ingests data from these domain and makes links.
ORCID – is a self registering ID system for researchers. During the registration process researchers can access ISNI to import their ISNI and metadata.
Harvard University Library are providing services for publishers for creating authors profiles and are including ISNI in the services and software.
La Trobe University have uploaded data from its institution registry maintained by the library.
COPYRUS is the Russian rights management society, a member of IFRRO.
The Linked Content Coalition http://www.linkedcontentcoalition.org/
LCC Forum Members
Associated Newspapers
Axel Springer
Coordination of European Picture Agencies Stock, Press and Heritage (CEPIC)
Common Rights
Copyright Clearance Centre
Copyright Licensing Agency
Criteria Media Exchange
Danish Producers Association
DDEX
Digimarc
EditEUR
MovieLabs
Microgen
EMI Music Publishing
Europa Distribution
European Magazine Media Association (EMMA)
European Newspaper Publishers Association (ENPA)
European Publishers Council (EPC)
European Visual Artists (EVA)
European Writers Council (EWC)
Federation of European Publishers (FEP)
Organizações Globo
Gruppo Espresso
Hachette Livre
International DOI Foundation
International Federation of the Phonographic Industry (IFPI)
International Federation of Reproduction Rights Organisations (IFRRO)
International Press Telecommunications Council (IPTC)
International Publishers Association (IPA)
IPR License
ITV
Journaux Francophones Belges
Laurence Kaye Solicitors
Microsoft
News International
Newspaper Licensing Agency (NLA)
Pearson
PLS
Plus Coalition
Reed Elsevier
Rightscom
RTL group
International Association of Scientific, Technical & Medical Publishers (STM)
Universitat de Lleida
Unidad Editorial
Vivere Consulting
These are the current ISNI sources with their codes that appear in the ISNI database.
ISNI’s scope overlaps but is not identical to VIAF’s scope. For persons, ISNI includes all VIAF (except sparse and undifferentiated records) plus includes many persons involved with music and research not present in VIAF.
Also, unlike VIAF, ISNI includes private data that may be used for matching but not displayed or diffused publically. Such data includes dates of birth (actors in particular do not like their dates of birth publicized because it limits the parts that they are offered). Rights management associations are also not permitted to reveal the relationships between real persons and pseudonyms. Witness the recent case of JK Rowling publishing crime novels under a pseudonym and being irked that her cover was revealed by her Lawyers.
ISNI’s role is different from VIAF’s. ISNI creates a permanent ID and is required to keep the ID as stable as possible, and where it changes must diffuse corrections. ISNI diffuses cross domain – libraries, trade, rights management, professional societies, education.
ISNI includes and online request and maintenance capability
Improved data quality and confidence
Anomaly reports – 7,000 date anomalies (>50% represent real errors)
Merge, split and data error reports (c. 5,000)
Matching improvements
Dates, common surnames, longest name form, weightings, new elements
Detection of UNIMARC Conversion errors
parallel main names, name variant conversion, related names conversion, missed data
Pseudonyms
Feedback, record links (c. 70,000)
More widely diffused linked data
Proposal for inter-operation – joint notification, shared maintenance
In 2012, ISNIs were sent to VIAF. In 2013, the decision was taken to includes ISNI as a source in VIAF so ISNI started sending full records to VIAF for all assigned records that contained a VIAF code, including all restricted data from nonVIAF sources. In 2014, as well as records containing a VIAF code, records containing an ISNI code are now being sent to VIAF, including those created by the pseudonym programs or created manually by the ISNI Quality Team. The VIAF records that have been edited manually by the ISNI quality team contain a verification mark so that it can be used in the VIAF clustering process.
This slide outlines how ISNI processes its monthly files from VIAF.
There are two types of error in a mixed identity. The clustering software can make an error by erroneously clustering records from two sources each representing a different person (with the same name). Or a single source record may have mixed identities by listing titles of works that belong to more than one identity (with the same name)
This is a typical input from an end user of the ISNI database. The requests are coming in on average 2-3 a day. The requests are almost all very high quality as per above and most (to our surprise) include an email so that we respond with the action taken. ISNI also engages to notify all sources in case of a fixed error.
End user requests are stored in non-displayable fields in the ISNI record. Each evening new fields generate email alerts to the ISNI Quality Team. The QT then decides appropriate action, i.e. making links viewable, merging records, splitting records and generating notifications to all sources involved. In this slide, the QT has determined that a single VIAF source is causing a mixed identity and notifies the source by email. The resulting split records are marked as having been verified manually. This becomes a signal to VIAF that they should be treated as special status records by the VIAF clustering program.
In the case above, the QT has determined that a split identity has been caused by a VIAF (or ISNI) cluster error. Two separate records are made with a verification mark. Sources are notified of the cluster change as appropriate.
The ISNI Quality Team plays an essential role in the life of the ISNI database. Not only does it respond to End User input, it proactively tests the database, looking for sets of records to re-process, making recommendations to improvements to the algorithms. So far the QT has been able to keep up with the input from End Users.
When events occur on records in the ISNI database, all sources concerned are notified. The notification is in the form of a regular monthly XML report. Notification fields for matches (tells you someone else has matched your data)
Recipient source code (028C $2)
Source of incoming record
Date/time of match
Matching data string
Matching data type (name and dates, name and title, partial name, date, title)
Matching score
Total evaluation score
Date/time stamp of notification
Notification fields for errors (you need to take action)
Type of error: merge, duplicate, dataError or split
Recipient source code
Recipient local identifier
Date/time stamp of field creation
Data field contents
Data field identifier, (e.g 021A = title)
Date/time stamp of notification
Should be
Correct ISNI
ExplanatoryText
The ISNI assignment system also plays a vital role in maintaining the quality of the ISNI database. When errors are found if they can be fixed by program, they are. For example we found that one source was always giving a full date when only the year or month and year were known, such that there was an unusual peak in the index for e.g. 1900-01-01 and 1920-03-01 etc. The matching algorithm was adjusted to mistrust such dates, the records were found and re-matched with the new algorithm. The matching algorithm is continually being refined.
Action has been taken on most of these recommendations
Action has been taken on most of these recommendations
It is important to fixed mixed identities and duplicates at the cluster level. A record containing a mxed identity can match into a good cluster and pollute that cluster, resulting in incorrect diffusion, especially of linked data. A mixed identity record in isolation potentially causes duplicate clusters. Duplicates in a source file threaten to create duplicate clusters in VIAF.
The ISNI Quality Team currently receives email alerts at the rate of 2 to 3 per day. End user alerts are stored in hidden fields from which the system generates an email alert. The number is currently manageable but the QT wants to be ready to be able to scale up because the volume has been steadily increasing. Associate QT members would be primarily responsible for controlling the identities in their own sphere but on standby for peaks in end user requests generally.
ISNI members have privileged access to the database but without the obligations of Associate QT member.
ISNI members have access to detailed indexes, both via the web interface and via the SRU API.
Slide by Pauline Chougnet, BnF. This slide shows the compare screen available to ISNI members. Members are able to compare records to make merge decisions. A slide set has been made by the ISNI Quality Team giving merge guidelines.
Basic statistics: Provisional = records that have been loaded by batch and did not match where the name is also not unique. Members can enrich provisional record manually via the web interface to make them assigned. Suspect records are those that have been marked either manually or by anomaly detection as possibly having mixed identities. “Unique” records are assigned records with a single source; the name is unique to the database in its full and abbreviated forms. “Possible” are provisional records that the programs have marked as matching with another but with a score below the acceptable merge threshold. Members are able to use the compare screen and make manual merges.
Cross matches indicate the numbers of records where your source and another source co-occur. Curious cases could point to mixed identities.
This slide is just an example of the links generated by the load of La Trobe University to ISNI. This gives a new idea to the university of the impact of its researchers.