The International Standard Name Identifier, or ISNI, was created to identify the millions of contributors to creative works and those active in their distribution, including researchers, inventors, writers, artists, visual creators, performers, producers, publishers, aggregators, and more in order to resolve the problem of name ambiguity in search and discovery. Now, Laura Dawson, Product Manager of Identifier Services at Bowker, will show us how ISNI has developed since the standard was first published in 2012. How is it managed? Who receives numbers? What impact has it had on publishing? And how can it be incorporated into current metadata management and distribution?
2. What Is ISNI
• ISO Standard, published in 2012
• International Standard Name Identifier
• Numerical representation of a name
– 16 digits
– Assigned to public figures, contributors of content –
researchers, authors, musicians, actors, publishers,
research institutions – and subjects of that content (if
they are people or institutions).
– Example: 0000 0004 1029 5439
3. Who is ISNI
• Founding members
– IFRRO (International Federation of Reproduction
Rights Organizations)
– CISAC (International Confederation of Authors and
Composers Societies)
– SCAPR (Societies’ Council for the Collective
Management of Performers’ Rights)
– OCLC
– CENL (Conference of European National Librarians),
represented by the British Library and the National
Library of France
– ProQuest, represented by Bowker
4. Members
Quality Team
Board of Directors
ISNI Organizational Structure
Registration Agencies
Ongoing
assignments/
general public
5. How Does ISNI Registration Work
• Publisher submits names for assignment through a Registration
Agency
• RA works with the publisher to ensure the data feed is well-
formatted, and sends that feed to the Assignment Agency
• AA assigns as many ISNIs to the names in the feed as it can, using
complex algorithms and business rules that evolve with each feed
• AA returns a file of names with ISNIs attached to them
– This may not be the full file of names
– Ambiguous names are held for review by Quality Team
– QT assignments and other exceptions (assignments as a result
of improvements to the algorithm) are returned to RA quarterly
– Process is not instant. Assignment may be immediate if the
name and other information is unique, but frequently
assignments take a week or two.
6. Stage One
Customer
submits data to
Registration
Agency
Registration
Agency sends
file to
Assignment
Agency
Assignment
Agency assigns
as many ISNIs
to the names as
it can
9. Display
• Only minimal metadata is displayed
• Not meant as a comprehensive profile
• ISNI is a tool for linking data sets, collocation, and
disambiguation
• Enhancements to the record can be made but not
required
14. How many names in the ISNI database?
• Over 8,300,000 assigned
• 10,112,931 provisional (awaiting a match from another
data set for corroboration)
• Your author names may well already have ISNIs.
http://www.isni.org/search.
• Bowker’s Books in Print contains 2.33 million ISNIs –
33% coverage of all contributors, with more coming in
monthly.
20. Data Quality
• Based on matching names to existing records in
database (over 18 million names)
• Strict criteria for assigning ISNIs to names
• Quality team oversight (manual edits)
– British Library
– National Library of France
– OCLC
19
21. Assignment Criteria
• If on the common surname list:
– Birth date
– Death date
– ISBN(s)
– Title(s)
– Co-authors or institutional affiliation
• If not on the common surname list
– Title(s)
– Birth date
– Death date
– Any other distinguishing factors (“is not”)
• If unique
– Immediate assignment
20
22. ISNI and ORCID
• ORCID numbers are a subset of ISNI’s database
• Working towards alignment, with ultimate goal of single
assignment
• There is ISNI representation on the ORCID Technical
Steering Group, and ORCID representation on the ISNI
Technical Committee
• A researcher may have both an ORCID and an ISNI
21
23. If I have an ORCID, do I need an ISNI?
• Identifier across all types of works – particularly relevant
to faculty members in the Arts and Humanities
• Assigned by organizations, such as publishers and
universities, on behalf of contributor
• Link your ORCID and ISNI and be done – no need to
create yet another profile
22
25. Online Registration Coming in Q4 2014
• Bowker is building functionality to register individual
ISNIs online
• First for authors who already have ISBNs assigned to
them
• Phase II includes all contributors
• Contributors without ISBNs can continue to register by
emailing isni@bowker.com
24
16 digits – final one is a check digit so it is sometimes an X
In machine-to-machine communication, the ISNI is rendered without the spaces. We break it up into four sections just so it’s more human-readable on the web.
As you can see, these are representatives from a wide variety of domains. IFRRO is primarily text and image-based. CISAC is concerned with music. SCAPR’s domain is film and video. OCLC, which is also the assignment agency, as well as CENL, are concerned with library usage of ISNI. ProQuest’s domain is around book publishing, web usage, scholarly research, and inclusion in semantic web ontologies.
ISNI Board of Directors is made up of representatives from each founding organization, as well as a representative from the collection of registration agencies.
The Assignment Agency – the part of ISNI that does the actual assigning of numbers to names – is OCLC. The Assignment Agency runs out of OCLC’s Leiden office in the Netherlands.
The Quality Team – comprised of librarians from the British Library and the French National Library – works with the Assignment Agency to ensure that ISNI information is accurate and that disambiguations and collocations are correct. Basically, the Quality Team handles errors in the data, and continuously works on refining the assignment algorithms.
Members of ISNI include the founding members, of course, as well as Macmillan (ISNIs are in use at Digital Science), MusicBrainz, VIAF, and numerous other organizations. These members send data through a Registration Agency, to the Assignment Agency, and ISNIs are assigned to the name records in that data.
Currently, there are two Registration Agencies – Ringgold, for institutions, and Bowker, for everything else.
VIAF served as the initial data set for ISNI.
Just to give a run-through of the methodology behind ISNI assignment – ISNI is not a self-claiming system. Individuals can apply for ISNIs through a Registration Agency, but ISNIs are also assigned on behalf of authors and other contributors by their publisher or another organization that’s distributing content. ISNI profiles are not meant to be comprehensive. The ISNI website displays the minimal amount of information required to disambiguate one contributor name from another. Behind the scenes, in the ISNI database, there may be more information – which is used for disambiguation and collocation. But ISNI takes privacy very seriously and does not display more than is absolutely necessary, unless a person would like to make more information available on their ISNI page.
Run through slide.
To recap – a publisher submits a data file to a registration agency. The RA packages up that file, working with the publisher and the assignment agency to ensure the file is in an easily-process-able format. The assignment agency then assigns ISNIs to as many names as it can.
Once those assignments are made, a file is sent to the registration agency. The registration agency shares the file with the publisher, who QA’s it and then uses it as they wish. Dealing with a registration agent – as opposed to many individual publishers or other institutions – simplifies the process for the assignment agency.
Given that not all names in any given file will receive an ISNI, how do updates work? The AA sends updates quarterly. The RA parses through these updates, and disperses the appropriate files to the publishers, who then each ingest their update.
As previously noted, and unlike ORCID, ISNI does not display a comprehensive profile. The number is a tool – it determines whether the Bryan May who plays guitar for Queen is also Brian May the astrophysicist (he is), or Brian May the editor of an obscure photography book (he is). The number – the tool – determines that Fyodor Dostoevsky is the author of Crime and Punishment no matter how you spell his name or what character set you’re using.
And ISNI is a tool for linking data sets together. If two disparate databases – such as Books in Print and Musicbrainz – use ISNIs, then cross-domain linking is possible. This allows for a music professor, for example, to be unambiguously identified in his capacity as a session musician for Wynton Marsalis as well as in his capacity as the author of monographs on the evolution of jazz keyboard styles. An organization using both of these data sets would be able to link all the work produced by that professor regardless of whether it is audio or text.
If a contributor wishes to enhance his or her ISNI record, that is of course possible. If an individual wishes more information to be displayed, or to correct information in the database, he or she can work with a registration agency and the ISNI Quality Team to ensure this happens.
So here’s an example of an ISNI record. You’ll notice a couple of things – the paucity of information, and the yellow box. Clicking on the yellow box leads a user to a basic online form where they can submit additional information or corrections for the record. That information is evaluated by the Quality Team, and implemented if it’s determined to be accurate. Submissions by the actual contributor are enormously welcome – because, of course, the contributor is the best possible source!
This is just an example of the data sets that are already using ISNI. You’ll note that ORCID is using ISNIs – this is to identify research institutions, which are regarded as contributors to research being done there. Wikipedia is using ISNIs as well, as is Scholar Universe and other ProQuest products. ISNI allows any one of these organizations to transmit contributor data to any other one of these organizations. It also allows a third party to combine data sets and link them through the common ISNI. In this way, ISNI serves as a bridge identifier amongst disparate data sets.
Just a rundown of some organizations already using ISNIs. We are currently piloting with Booknet Canada and the Authors Guild.
And an example of ISNI use in Wikipedia. Clicking on the ISNI takes you to the VIAF entry for that contributor – that’s how Wikipedia decided to use ISNI.
We have over 8.3 million ISNIs assigned to names in the ISNI database, with an additional 10 million awaiting a corroboratory match. Because the primary application of ISNI is in Linked Data, large data sets have served as the basis for the ISNI database. Recruiting one contributor at a time – given the large number of domains that exist for contributors – is not feasible for ISNI implementation; it would take far too long. So assignment is fairly automated, as we’ve discussed, and geared towards large data sets.
These are two different theologians. But Random House is publishing both of them. They must make sure that they are getting the appropriate royalties to the correct author. The subject matter is not enough to distinguish the two, and middle names are not always consistently listed – how do we know for certain that these are two different individuals? The ISNI helps the publisher definitively disambiguate the two, and pay correctly.
This is a directory of researchers at Xerox PARC. How can they be sure that research published under “Y. Wang” is credited to the right person? Even “Yu Wang” might lead to some mistakes. Using ISNIs definitively separates the research of Yu Wang from Yunda Wang.
At Arizona State, there are two faculty members named Michael White who both work in the legal area. Again, area of research is not necessarily an effective disambiguator – but a numerical distinction provides clarity.
Here we have the ISNI record for Brian May. As you can see, he’s done a few things – as an author, a creator, and a performer.
As you can see, the ISNI links together disparate types of data. Brian May, noted guitarist for the band Queen, is also an astrophysicist who has published his dissertation “A survey of radial velocities in the zodiacal dust cloud”. He has ALSO annotated a seminal collection of stereoscopic photographs. A true Renaissance man – and the ISNI allows for these works to be linked together under his unique, persistent identifier. So that if someone sees his dissertation or his photograph collection, and questions whether or not this is the same Brian May that played “Bohemian Rhapsody”, they can confirm that yes, this is the same (rather unusual) person.
Data quality is paramount for ISNI. The quality team work through over 17 million records. And, as I mentioned, the data quality team also solicits individual feedback – crowdsourcing, if you will, but with editorial oversight. With a data set this large, accuracy is an enormous challenge, so the role of the data quality team is critical.
This is just a list of criteria for how the assignments get made. ISNI has compiled a list of common surnames which dictates how much other data is necessary for assignment.
ORCID is the Open Researcher Contributor ID – in use primarily in STM publishing.
ISNI designated a set of a million numbers for ORCID’s use. This is so there won’t be any data integrity issues – particularly among data sets using both identifiers.
We are, in fact, working extremely closely with ORCID. Our mechanisms for assignment are different, but we do have the ultimate goal of having a single number serve both the ORCID and ISNI communities – similar to how an ISBN is also an EAN. There’s a lot of work we have to do to get there (issues around deprecation, redirection, and critieria for assignment – as ORCID only requires an email address), but we are on one another’s technical committees, and are in talks at the board level to create an alignment plan. We’ll have more information after an ORCID-ISNI joint board meeting later this month. In the meantime, a researcher may have both an ORCID and an ISNI.
Because ISNIs are assigned through library organizations, large databases, and other sources, it’s quite possible you might already have one. You can go to http://www.isni.org/search to check and see.