NISO's IOTA Working Group: Creating an Index for Measuring the Quality of OpenURL Links
1. NISO’s IOTA Working Group
Creating an Index for Measuring the Quality of
OpenURL Links
Charleston Conference - Nov. 5, 2010
Rafal Kasprowski, Rice U.
Susan Marcin, Columbia U.
2. Agenda
• Background: Full-text linking and Advent of OpenURL
• Problems with OpenURL linking
• IOTA: Created in response to OpenURL linking problems
• Overview of IOTA's work
• IOTA project site
• Community-derived data reports
• Publicizing IOTA: wiki, blog, Twitter
• Get involved and Take advantage of current effort
3. Before OpenURL: Proprietary Linking
• A&I database providers offered option for full-text linking
(e.g., CSA, PubMed, etc.) .
• Libraries manually activated linking to full-text providers
they had subscriptions with.
• A&I --> Full Text
4. Proprietary Linking: Cons and Pro
• Linking had to be activated manually by libraries for
each full-text provider.
• A&I providers offering this option were few.
• Selection of full-text providers was limited.
But...
• Once set up, the static links to full texts were
accurate.
5. Advent of OpenURL
• Objective: Deliver full texts unrestrained by proprietary silos.
• Open standard generating link at time of request.
• Library's holdings indicate provider of "appropriate copy".
• A-Z list (e.g., e-journal, e-books):
o Knowledge base (KB) with library's holdings.
o Intermediary in linking.
• A&I ("Source") --> A-Z list ("KB") --> Full Text ("Target")
6. OpenURL Syntax and Resolver
A, Bernand, et al. "A versatile nanotechnology to connect individual nano-objects
for the fabrication of hybrid single-electron devices." Nanotechnology 21, no. 44
(November 5, 2010): 445201. Academic Search Complete, EBSCOhost (accessed
October 24, 2010).
http://ps4ps6lm2r.search.serialssolutions.com/?issn=0957-
4484&volume=21&issue=44&date=20101105&spage=445201&title=Nanotechnology
&atitle=A+ versatile+nanotechnology+to+connect+individual+nano-
objects+for+the+ fabrication+of+hybrid+single-electron+devices.&aulast=A+
+Bernand
Source Citation
Target OpenURL (Source OpenURL structured similarly)
7. Pros & Cons of OpenURL
Pros:
• KB/Resolver vendors took over most of the linking setup:
Less work for libraries and providers.
• Participation by A&I platforms and full-text providers
exceeded proprietary linking: OpenURL scales better
Cons:
• Dynamic linking less predictable than static linking: more
difficult to pinpoint cause of link failures
• OpenURL linking not improved significantly last 10
years.
• No systematic method exists to benchmark OpenURLs.
8. Problem Statement & Methodology
"72% of respondents to the online survey either agreed or
strongly agreed that a significant problem for link resolvers is
the generation of incomplete or inaccurate OpenURLs by
databases (for example, A&I products)."
Culling, James. 2007. Link Resolvers and the Serials Supply Chain: Final Project Report for UKSG,
p.33. http://www.uksg.org/sites/uksg.org/files/uksg_link_resolvers_final_report.pdf.
Recently, researchers have indicated the need for metadata
quality metrics, including:
• completeness;
• accuracy;
• conformance to expectations;
• logical consistency and coherence.
Bruce, Thomas R. and Hillmann, Diane I. 2004. The Continuum of Metadata Quality: Defining,
Expressing, Exploiting. In Metadata in Practice. Ed. Diane I. Hillmann and Elaine L. Westbrooks.
Chicago: American Library Association, pp. 238-256.
9. Année philologique OpenURL Study
2008 Cornell study led by Adam Chandler*
• Problem: Too often links sent from Aph did not
successfully resolve to requested resource.
• Objective: Examine quality of OpenURLs offered to
users by Aph in order to improve the linking.
Aph Study investigated:
• Faulty citation metadata from source database.
• Method to evaluate the OpenURLs.
*Chandler, Adam. 2009. Results of L’Année philologique online OpenURL Quality Investigation:
Mellon Planning Grant Final Report.
http://metadata.library.cornell.edu/oq/files/200902%20lannee-mellonreport-openurlquality-final.pdf.
10. Scoring System & Aph Study Outcomes
Concept of scoring in Aph study (based on B. Hughes study)*
• establish a baseline for comparison;
• results to be shared with data providers;
• develop a best practice.
Problem analysis in Aph study limited to:
• source link
• presence/absence of citation metadata elements
Results:
• OpenURL quality model: compares elements in Aph
OpenURLs to those of other providers.
• No scores, but model is first step towards scoring system.
*Hughes, Baden. 2004. Metadata Quality Evaluation: Experience from the Open Language Archives
Community. In Digital Libraries: International Collaboration and Cross-Fertilization. Ed. Zhaoneng
Chen et al. Berlin: Springer-Verlag, 2004, pp. 320-329.
11. Creation of IOTA
NISO accepts proposal to take Aph Study to wider community
• Improving OpenURLs Through Analytics (IOTA):
o Formed in January 2010.
Basic Assumptions:
• Results are achieved through an analytical investigation of
how OpenURL links work.
• Practical: Not OpenURL standard is addressed, but links
(OpenURLs) generated by standard.
• Selective changes to OpenURLs will lead to significant
improvement in linking success rate.
o Motto: "small changes. big improvements"
12. Desired Outcomes
• Develop community-recognized index for measuring the
quality of OpenURL links generated by content providers.
• Produce qualitative reports that will help OpenURL
providers quickly compare their OpenURL quality to that of
their peers.
• Method:
o fair;
o transparent;
o scalable across all OpenURLs and their providers.
13. Initial Focus
1. Core elements:
• Contained in IOTA's OpenURL reporting system;
• 9 million OpenURLs from libraries and providers.
2. Scoring system based on assumption:
• Correlation exists between
o # of core elements ("OpenURL completeness") &
o ability of OpenURLs to link to specific content.
3. Weighting assigned to core elements:
• Based on relative importance
o spage vs atitle
o issn vs jtitle
o doi/pmid vs date, etc.
14. Work in Progress
• Element weighting still in progress:
o E.g., importance of identifiers (doi, pmid) vs
bibliographic data (issn, volume, spage).
• Currently, IOTA focuses on OpenURLs from citation
sources only. OpenURL quality is also influenced by:
o knowledge base,
o resolver,
o full-text provider (target).
• High "completeness" score of OpenURLs not always
indicative of "success" in linking to full texts
o Combination of multiple indexes along linking nodes
may provide more complete picture.
15. Issues with openURL linking
• Why might "article" links not appear or work?
o target vendor does not support openURL linking
o problems with information was passed from source into
openURL string. ex:) not enough info, incorrect info
29. IOTA Blog & Twitter
http://openurlquality.blogspot.com/
Find us on Twitter
Hash Tag #nisoiota
30. How can I get involved in IOTA?
If you are a vendor --
• Look at IOTA data ... or...
• Point your tech folks to our data
• This data is meant to help make improvements in your
openURL linking.
If you are a librarian --
• Contribute data to IOTA
• Help spread the word to vendors
• More information on the IOTA project at:
http://openurlquality.niso.org/
Link resolver translates between Source and Target
Resolver checks in KB for the provider library has sub with and links to that appropriate copy
Citation info is transferred from Source to Resolver to Target within URL syntax.
Advent of KB/Resolver vendors.
"Dynamic linking less successful than static linking": I am not sure I would put it this way, because it is actually impossible to prove this assertion. [Changed to "less predictable"... RK]
You should also cite Baden Hughes (see my latest article manuscript for detail) [Cited 10/13/10 RK]
Information Technology & Libraries, 2003 --
The openURL standard isn't perfect. It doesn't fix data discrepancies. It assumes that the data being transported from one system can be properly interpreted and matched in a second system.
Clifford Lynch, 1997 --
This problem will gradually fade as more use is made of such linking elements and errors are reported and corrected. Some vendors will improve the quality of the linking elements in their A&I files; others will be known for offering "linking hostile" files and will consequently face a competitive disadvantage in the marketplace.
GOAL OF IOTA
Publicize documentation and tools for vendors who generate openURLs, so that they may improve overall quality and success.
IOTA Project Site: http://openurlquality.niso.org/
All links to IOTA sites contained on this main project site:
reports from log files
blog
documentation wiki
twitter feed
Will break it down in the following slides
IOTA = Improving openURL through analytics
Focus on data being passed openURL from citation (source) to openURL resolver page.
IOTA project site: http://openurlquality.niso.org/
reports from log files
SFX
WebBridge
Serials Solutions
WorldCat Link Manager
Thomson Reuters
AIP
Over 9 million openURL strings analyzed so far
So what do we look at when we analyze openURL strings...
** TWO TYPES OF REPORTS **
Source Reports
run vendor and source identifier (sid) / database reports.
Element and pattern frequency across source vendors or source databases
analyzes the number of times a core element has been included in an openURL string for each source database or source vendo
log files providers
Source Reports
presents the options for creating vendor and source identifier (sid) centric reports.
can select either an Open URL vendor or package, as well as the time frame and the contents of the report: pattern or element.
report type = elements or patterns
source = vendor or database
year = 2009 or 2010
quarter = q1 through q4
year = 2010
all quarters
all log files
vendor = ISI
2,238 openURLs analyzed
columns compare the percentage of ISI openURLs that contain en element with the percentage all logsource files
What do we see here?
ISI doesn't seem to be passing a DOI, PMID, book title, journal title, or eISSN
It looks like ISI primarily focuses on passing:
title
start page
voluma
issue
date
author's last name
ISSN
article title
ISBN
year = 2010
all quarters
all log files
vendor = EBSCO
773,730 openURLs analyzed
columns compare the percentage of EBSCO openURLs that contain en element with the percentage all logsource files over the same time period
What do we see here?
EBSCO passes a higher percentage of article titles and dates in their openURL strings
they have a lower percentage of article numbers, author last names, DOIs, and e-ISSNs.
Element and pattern frequency across source vendors or source databases
provides an analysis of the number of times a core element has been included in an openURL string for each source database or source vendor in the log file.
provides options for selection of core element to view, time period and whether you would like to see the information at the database or vendor level.
Source = vendor or database
Once you choose a vendor or database, you must choose a metric to analyze
This screen shows the metric "issue" for:
Cornell University
2010
All quarters
126,895 openURLs analyzed
It shows the percent of openURLs that contain an "issue."
What does this tell us?31% of Brepols openURLs that are passed to an openURL resolver contain an issue.
Listed by source:
ex:)
AMS = American Mathematical Society
Brepols
CAS = Chemical Abstracts Services
Sources are clickable!
If you are curious about a metric, for example "31% of Brepols openURLs that are passed to an openURL resolver contain an issue"," you can click on Brepols for more information.
This will show you the source report from Brepols, with the previous limits in place:
Cornell
2010
all quarters
Documentation link from main IOTA site takes you to our wiki
The IOTA wiki contains user documentation that offers more explanation of how to use the IOTA openURL reporting tools.
This includes:
IOTA FAQ
Processor documentation
User interface documentation
Glossary of terms
Our wiki site also contains a glossary to help define and offer context to the terms used in our IOTA reporting tool.
This includes describing:
Elements
Targets
Patterns
Source IDs of SIDs
and Vendors
IOTA Project Site: http://openurlquality.niso.org/
IOTA has a blog that reports on recent news and activites of our group, as well as select recent news in openURL.
One very interesting recent blog post had to do with the mention of DOI links not alwasy leading to appropriate copies:
"The DOI doesn’t always lead to the appropriate copy, it leads to the publisher’s copy."
IOTA has a Twitter presence.
Perhaps I should have mentioned this earlier, but iIf anyone is tweeting this, our hash tag is:
#nisoiota
A regular question that follows IOTA conference presentations is "how can I get involved in the IOTA project?"
If you are a vendor:
Start using our reporting tools and pointing their backend tech people to them so they can make improvements.
Please stress that we want the system to be a tool for them to make the user experience better.
And since we batch they batch the data into quarters, a manager can actually monitor their own company's progress.
If you are a library:
Spread the word
A QR Code is a matrix barcode (or two-dimensional code), readable by QR scanners, mobile phones with a camera, and smartphones. The code consists of black modules arranged in a square pattern on white background. The information encoded can be text, URL or other data.
QR is the initialism of Quick Response, as the creator intended the code to allow its contents to be decoded at high speed.