2. What we will cover
• IRUS context and overview
• How does it work?
• Usage data
• Collecting
• Handling
• Processing
• Storing
• Exposing statistics using the API and examples
• What is next?
• Q&A
2 IRUS: from counting clicks to COUNTER stats - 20 September 2022
3. IRUS context
IRUS: Open and flexible access to comparable and standardised usage statistics
for repositories
• Based on COUNTER Code of Practice, international
standard for measuring usage of e-resources
• 199 active participating repositories across 159
organisations
• Over 17 million individual items
• Between 2M and 6M usage events received daily
IRUS
IRUS-UK
IRUS-CORE
IRUS-ANZ
IRUS-US
IRUS-OAPEN
3 IRUS: from counting clicks to COUNTER stats - 20 September 2022
4. High-level overview
Collect raw usage
data
• Repositories send
logs via tracker
protocol
Process into
COUNTER stats
• Filter out robots and
rogue usage and
double-clicks
• Add metadata
Enrich with
additional
information
• ORCIDs
• IRUS item types
Expose
• API based on
COUNTER SUSHI
standard
Present and
export
• Web reporting
interface
• Widget
Curate the data
4 IRUS: from counting clicks to COUNTER stats - 20 September 2022
5. How we collect usage data – the Tracker Protocol
• We need a standard approach to collect raw usage data when
repository pages are viewed and full content downloaded
• The Tracker Protocol
• Devised in collaboration with COUNTER
• A user* clicks on a link to an item page (i.e. views item metadata) or an associated
file (i.e. requests a download)
• An OpenURL-like log entry – a “tracker message” - is sent to a URL endpoint on the
IRUS server for further processing
• Tracker messages are stored in daily** log files
• The Tracker Protocol specification for COUNTER R5 conformance
* The ‘user’ could be a human or a machine
** The date messages are received, which isn’t necessarily the same as the date a usage event
happened
5 IRUS: from counting clicks to COUNTER stats - 20 September 2022
6. Tracker Protocol Implementations
• Various software platforms underpin Institutional Repositories
• Each needs its own Tracker Protocol implementation
• Out-of-the-box standard implementations:
• DSpace, Eprints, Figshare, Haplo, Fedora-Samvera (on-the-fly, as usage occurs)
• Worktribe (batch data, previous day’s usage)
• Out-of-the-box 50% standard implementation:
• Elsevier Pure (batch data, previous day’s usage)
• Only sends data about file downloads NOT metadata views
• Bespoke standard implementations:
• CORE, Equella, Other (on-the-fly, as usage occurs)
• Esploro, Fedora-Other (batch data, previous day’s usage)
• See https://irus.jisc.ac.uk/r5/participate/implement/
6 IRUS: from counting clicks to COUNTER stats - 20 September 2022
7. Processing log file usage data
• Takes place every day at 3:30am
• A scheduled task processes data in the previous day’s log files
• To put it simply:
• Gets rid of ‘rubbish’ usage data it finds in the logs
• Puts eligible usage event data into a Tracker Data table for further
processing
• It’s easier to describe more fully in a diagram . . .
7 IRUS: from counting clicks to COUNTER stats - 20 September 2022
8. Daily Tracker Log Processing – scheduled process at 3:30am each day
Tracker data
- on the fly
199 repositories
Daily log
files
Tracker data
- daily batch
Processing History table
Trackers table
Repositories table
Server Authority table
Blacklisted servers table
Tracker Log Processing Script
COUNTER Robot Exclusions
Fake referrers
Malformed messages
Blacklisted servers
Messages from unknown
repositories
Unregistered
Tracker Data table
Eligible messages from
registered repositories
Monthly
Tracker Data table
Summary reports
8 IRUS: from counting clicks to COUNTER stats - 20 September 2022
9. Processing Tracker Data table usage events - Daily
• A scheduled task processes data in current month’s Tracker Data table
• Task consists of a ‘controller’ script that runs a dozen other scripts, which
between them:
• Identify and eliminate usage that falls foul of IRUS exclusions*
• Harvest bibliographic metadata for items that IRUS hasn’t encountered before
• Utilises standard OAI-PMH and APIs
• Includes assigning an IRUS Item Type based on source item types exposed in metadata*
• Collect and validate ORCiDs in item metadata to populate Author Authority tables*
• Perform COUNTER R5 processing that converts usage data to Daily statistics
• See how your data has been processed in the Processing statistics report
• Time for another diagram . . .
* See later slides
9 IRUS: from counting clicks to COUNTER stats - 20 September 2022
10. Daily Tracker Data Processing – scheduled process at 6:00am every day
Processing history table
Monthly Tracker Data table
Usage events that occurred two
days ago
IRUS Item Types Mapping
Rules tables
Author Authority
Candidates table
Tracker Data Processing Script
Data processing
IRUS Daily Exclusions
Summary reports
Metadata processing Item Metadata Table
Harvest metadata - OAI-PMH
Harvest OAPEN metadata - OAI-PMH
Harvest CORE metadata - API
Harvest Vivli metadata - API
Harvest Pure dataset metadata - API
Process author authority candidates
Author Authority Table
Author Authority Item
Lookup Table
Daily statistics processing
Daily eligible COUNTER data processing
Daily statistics creation
Daily Statistics Tables
Provisional statistics
10 IRUS: from counting clicks to COUNTER stats - 20 September 2022
11. IRUS exclusions – robot and rogue usage
• Use of the COUNTER User Agent Exclusion List
• Is the minimum COUNTER requirement for robot detection
• Works reasonably well for traditional scholarly publishers behind pay barriers
• But it’s not enough in the open access world
• Besides ‘good’ bots like Googlebot, there are
• ‘bad’ bots that don’t declare themselves as bots but are mostly harmless
• and a host of others: hackers, spammers, dictionary attackers, etc.
• In addition, based on extensive analysis of our logs, we also eliminate usage from
• IPs with 40 or more downloads in a single day
• IP/UAs with 10 or more downloads of a single item in a single day
• IP ranges grouped by the 1st three octets that have 300 or more downloads in a day
• During an audit review, the COUNTER auditors agreed that these are reasonable
extra measures to remove robotic/rogue activity from our statistics
11 IRUS: from counting clicks to COUNTER stats - 20 September 2022
12. IRUS & Item Types
• When we harvest item metadata from repositories, one of the fields we
capture is the dc:type field
• Describes the nature or genre of the item - article, book, thesis, etc.
• It does not describe the Subject or Format of the item
• A lack of standardisation in the use of item types when looking across
repositories
• We encounter literally thousands of terms in dc:type
• Default lists of item types provided by software platform
• Lists of item types developed by individual institutions
• Controlled vocabularies, including COAR Resource Types
• Terms that are nothing to do with ‘type’
• This isn’t very useful and is a barrier to comparability
• Hence we need an appropriate, meaningful and useful item types across the
whole of IRUS
12 IRUS: from counting clicks to COUNTER stats - 20 September 2022
13. IRUS Item Types Mappings
• The original set of IRUS item types was defined in 2012
• Revisited and revised a number of times
• We used a manual mapping process, which had become unsustainable
• The current set of IRUS item types was defined in July 2022
• Based on analysis of over 4 million item records
• We expanded and enhanced the list, which consists of 31 IRUS item types
• We now use an automated, programmatic solution mapping to those IRUS types
• 40+ rules derived from analysis of over 4 million item records
• For more information, see the IRUS
• Item types and mapping policy
• Item type mappings report
13 IRUS: from counting clicks to COUNTER stats - 20 September 2022
14. Author Authority - ORCiDS
• When we harvest item metadata, we scan for strings that look like ORCiDs
• These are added to the Authority Candidates table
• A subsequent script processes each ORCiD candidate
• If the ORCiD isn’t already in our system
• We put out a call to the orcid.org API to validate and verify the existence of the
ORCiD, and retrieve canonical author information
• If the ORCiD is found, we update the Author Authority and Item lookup tables
• If not, the ORCiD is discarded
• If the ORCiD is already known to our system
• We just update the Item lookup table to create an association between the ORCiD
and its item
14 IRUS: from counting clicks to COUNTER stats - 20 September 2022
15. Processing Tracker Data table usage events - Monthly
• A set of 24 tasks process data in the previous month’s Tracker_Data table
• e.g. on 3rd September 2022 we produced the stats for August 2022
• The tasks fall (broadly) into four categories
• Data analysis
• Building up a picture of ‘user’ activity over time
• Future improvements in robot and rogue usage detection
• Data processing
• Reprocessing IRUS exclusions across the month
• Metadata processing
• Reprocessing metadata harvesting across the month
• Monthly Statistics Processing
• Producing COUNTER conformant monthly statistics
• Time for another diagram . . .
15 IRUS: from counting clicks to COUNTER stats - 20 September 2022
16. Monthly Tracker Data Processing – (will be scheduled to) run on the 3rd
of each month
Processing History table
Monthly Tracker Data table
Item Metadata Table
Author Authority
Candidates table
Tracker Data Processing Script
Summary reports
Data analysis
IP address/User Agent activity
IP address/User Agent distribution
IP/UA activity tables
Data processing
IRUS Exclusions
Metadata processing
Harvest metadata – OAI-PMH & APIs
Harvest metadata – RIOXX
Process author authority candidates
Author Authority Table
Author Authority Item
Lookup Table
Monthly statistics processing
Eligible COUNTER data processing
Monthly statistics creation
Monthly Statistics Tables
IRUS PR & IR
OAPEN PR & IR
CORE PR
16 IRUS: from counting clicks to COUNTER stats - 20 September 2022
17. Metadata Curation
• Historically, we’ve only harvested metadata for an item when first
encountered
• We’d only update metadata where we knew it was necessary
• However, it’s become increasingly apparent that we should regularly
refresh our metadata records
• There are frequent changes to repository records – (un)deletions,
corrections, enhancements . . .
• We’re currently updating all item metadata following the move to
automated and updated item type mapping
• We’re implementing regular incremental harvesting to pick up
metadata changes in repository records
17 IRUS: from counting clicks to COUNTER stats - 20 September 2022
18. Data Curation
• Daily statistics tables get very big, very quickly
• Performance and storage issues
• We only keep statistics for the current month and the previous two months
• Older daily statistics are deleted on a monthly basis
• We’re very mindful of GDPR requirements!
• Usage data we gather includes IP addresses
• We store that data securely – only as long as we need it
• COUNTER rules require us to keep raw usage data for the current year
plus the previous two years
• Each year we delete old log files and old records from our database, which
are no longer required
18 IRUS: from counting clicks to COUNTER stats - 20 September 2022
19. Exposing statistics – IRUS Custom API
• Once the statistics are in the database we need to expose them
• We have a number of API methods to retrieve
• Daily statistics
• Item level
• Available for current month + two previous months
• Monthly statistics
• Item level and Platform level
• Available from the time we started collecting statistics for any given repository
• Formats: JSON, and tabular – CSV/TSV
• Openly available to participants and other third parties
19 IRUS: from counting clicks to COUNTER stats - 20 September 2022
20. Exposing statistics – example API call
https://irus.jisc.ac.uk/api/v3/irus/reports/[report_id]/?
requestor_id=[institutional Requestor_ID]&
begin_date=[YYYY-MM | YYYY-MM-DD]&
end_date=[YYYY-MM | YYYY-MM-DD]
{& optional parameters, e.g. platform, item_id, metric_type, content_type}
Many example calls on https://irus.jisc.ac.uk/r5/embed/api/
20 IRUS: from counting clicks to COUNTER stats - 20 September 2022
21. Exposing statistics – using the API
API
Excel
(CSV)
Website
(IRUS)
Website
(via widget)
21 IRUS: from counting clicks to COUNTER stats - 20 September 2022
22. Exposing statistics – widget example
More information at https://irus.jisc.ac.uk/r5/embed/widget/
22 IRUS: from counting clicks to COUNTER stats - 20 September 2022
23. What’s happening now and next?
In progress
• Metadata refresh
• Repository size and scale
information
• Backend reporting and
monitoring
Planned
• COUNTER Release 5.1
• COUNTER Compliance Audit
• R4 stats in the Individual
Item Report
Considering
• CORE and repository usage
• Journal information
• Funder information
• Search
• Request reports by email
• Regular reports to your inbox
• Visualisations
23 IRUS: from counting clicks to COUNTER stats - 20 September 2022