Presentation to the Authority Control Interest Group at ALA Midwinter, Chicago 2015. Discusses the traditional function of authority control and its limitations,as well as newer sources of identification for people that broaden our ideas of what identity should be.
2. Topics
Questioning our assumptions about Authority Control
Differences between records and people
NAF ‘Work’ records
Limitations of our current approach
What’s online now?
Other sources of name data?
2/1/15ACIG/MW 2015 2
4. Current Approach
Limitations of library authority control:
Focus is on name variants to support unique text
strings
Record ID in NAF, SAF and VIAF are not name IDs
ORCID, ISNI, etc. intended to identify person
Rules don’t support references and ‘outlinks’
Centralized management both a strength & weakness
Doesn’t address needs for more automated solutions
2/1/15ACIG/MW 2015 4
5. Name Aggregation
VIAF—an aggregation of authority records
Records gathered are used to create services and
visualizations
Timelines
Associates, works, publishers, etc.
Versioned in a manner different from id.loc.gov, but with
significant limitations
Policy documentation missing
2/1/15ACIG/MW 2015 5
9. Usage Questions
If we use URIs for names and subjects, should we
also cache the data behind them?
If we cache, we need to worry about change
management
What kind of support will we expect from external
systems? How do we express those expectations?
What constitutes change in a name ID file?
2/1/15ACIG/MW 2015 9
11. ORCID
“ORCID provides a persistent digital identifier that
distinguishes you from every other researcher and,
through integration in key research workflows such as
manuscript and grant submission, supports automated
linkages between you and your professional activities
ensuring that your work is recognized.”
Began as a way to disambiguate scientific
researchers, now more broadly used
Encourages linking ORCID with other identifiers
2/1/15ACIG/MW 2015 11
13. ISNI
Audience: libraries, publishers, databases and rights
management organizations
Website and API access
Limited online input
Online enrichment data moderated
Users must request ISNI through organizations
charged with maintaining the information
2/1/15ACIG/MW 2015 13
14. NAF ISNI ORCID
2/1/15ACIG/MW 2015 14
*Centrally managed and
distributed
*Expert input only
*Files available, no
persistent deletes
*Centrally managed
*Some ‘improvement’ by
non-experts
*Must be signed in to add
data
*Self-registration and content
management by acct. owner
*Some ‘private’ data not
available via public API
*Member institutions can
integrate access and update
16. Developed for ...?
Managing fixed name/title strings as ‘works’ made
more sense in the catalog card days
Does it make sense now?
Will RDA supplant this tradition with ‘work’ entity
records?
Some compilations (Bible, historically anonymous
titles) may require a hybrid approach
2/1/15ACIG/MW 2015 16
17. Dealing with change
Many flavors of version control!
Fine granularity at transaction level
Dated URIs (may be links to earlier versions)
Last date only (unspecified changes)
Linked access to old versions
Dated release number (and sometime diffs)
Most recent raw file availability
2/1/15ACIG/MW 2015 17
18. Alternatives?
We need to find something which is optimized for
automated updating
The model for software versioning and updating is
already used by all of us (even if we’re unaware of it)
‘Semantic versioning’ (semver.org) can be used to
bring similar version control options to semantic
information (elements, vocabularies, etc.)
2/1/15ACIG/MW 2015 18
19. 3 tier numbering system: major.minor.patch
X.X.X
Major: breaks backwards semantic compatibility
Minor: change in semantics of any property of any
element
Patch: no change in semantics of any element
Semantic Versioning
2/1/15ACIG/MW 2015 19
20. Smart Semantics
Smooth interaction between application and vocab
Transparent to users (until major change requires some
user decisions)
Distributed version control (Git, etc.)
Vocabulary managers trusted to comply with (simple)
semantic versioning policies and practices
And encouraged to provide details of semantic
breakage between major versions
2/1/15ACIG/MW 2015 20
22. Remaining Questions
How do we make the shift from assuming human one-
by-one lookup to the kind of environment we see in
the software industry?
Is that lack of capability one of the reasons that the
vendors have been holding back?
How much of a problem is it that the ‘new IDs’ (ORCID
and ISNI) don’t seem to do semantic versioning? Are
they assuming only lookup will maintain their ‘share of
the market?’
2/1/15ACIG/MW 2015 22
23. No More Handcrafting!
RIMMF’s automated approaches emphasize using
available sources, like NAF
If a name cannot be made unique, does it matter?
Moves the requirement of uniqueness to URI, not string
Does that mean we can stop worrying about
undifferentiated names?
2/1/15ACIG/MW 2015 23
Hinweis der Redaktion
What to do about aggregations like VIAF? Are they truly a step forward?
Still focused on disambiguation of names, but in an international context. The IDs are primarily record based (or based on the aggregation of records), except for the ISNI, labeled as a ‘test’. VIAF has a limited notion of co-authors (their gathering term for relationships of various kinds). For instance, Emma Thompson is listed as a co-author.
This is the bottom part of the version history on VIAF for Jane Austen, showing the link for RSS Feeds. Each change is dated and the transaction typed, but some of the links do not resolve.
ID.loc.gov is a site used by LC to expose some of their vocabularies—the relator terms are one of them. A few years ago they were re-factored, and significant changes made. There are very sparse indications of the change and no way to see the prior version—making all the modifications listed in the ‘Change Notes’ completely opaque.
Examples of change in name. Why caching is used: protects systems from network slowdowns or failures. Plus, if the caches are retained for the purposes of ‘roll-back’ there remains a raw path to previous versions.
One of the newer ‘name’ information and ID systems is ORCID, begun by a consortium of science publishers to assist in identifying researchers whose publications carry truncated names (surname and forename initials). The information shown here is for my ORCID id, for which I provided the information about my education, employment, and professional efforts (Including papers). The information for this ID goes on for several pages, and can be edited when signed in. ORCID grows by the use self-registration for information on researchers.
ISNI uses available, trusted sources to set up information and identifiers. If the sources they use do not always include individuals without formal publications, but there is a process for getting one for yourself—not as simple or as extensive as ORCID. If you take the invitation to improve the record, you are set to a very unstructured form. You don’t have to be signed in, but there is a Captcha on the form.
I didn’t initiate the creation of the ISNI ID—it was pulled from my LCNAF record. Getting an ISNI if there’s nothing in any of the source databases
Fine granularity (OMR)
Dated URIs (DCMI and Dewey)
Last date only (id.loc.gov and BibFrame)
Linked access to old versions (schema.org)