SlideShare ist ein Scribd-Unternehmen logo
1 von 25
IRUS: from counting clicks to COUNTER stats
20 September 2022
What we will cover
• IRUS context and overview
• How does it work?
• Usage data
• Collecting
• Handling
• Processing
• Storing
• Exposing statistics using the API and examples
• What is next?
• Q&A
2 IRUS: from counting clicks to COUNTER stats - 20 September 2022
IRUS context
IRUS: Open and flexible access to comparable and standardised usage statistics
for repositories
• Based on COUNTER Code of Practice, international
standard for measuring usage of e-resources
• 199 active participating repositories across 159
organisations
• Over 17 million individual items
• Between 2M and 6M usage events received daily
IRUS
IRUS-UK
IRUS-CORE
IRUS-ANZ
IRUS-US
IRUS-OAPEN
3 IRUS: from counting clicks to COUNTER stats - 20 September 2022
High-level overview
Collect raw usage
data
• Repositories send
logs via tracker
protocol
Process into
COUNTER stats
• Filter out robots and
rogue usage and
double-clicks
• Add metadata
Enrich with
additional
information
• ORCIDs
• IRUS item types
Expose
• API based on
COUNTER SUSHI
standard
Present and
export
• Web reporting
interface
• Widget
Curate the data
4 IRUS: from counting clicks to COUNTER stats - 20 September 2022
How we collect usage data – the Tracker Protocol
• We need a standard approach to collect raw usage data when
repository pages are viewed and full content downloaded
• The Tracker Protocol
• Devised in collaboration with COUNTER
• A user* clicks on a link to an item page (i.e. views item metadata) or an associated
file (i.e. requests a download)
• An OpenURL-like log entry – a “tracker message” - is sent to a URL endpoint on the
IRUS server for further processing
• Tracker messages are stored in daily** log files
• The Tracker Protocol specification for COUNTER R5 conformance
* The ‘user’ could be a human or a machine
** The date messages are received, which isn’t necessarily the same as the date a usage event
happened
5 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Tracker Protocol Implementations
• Various software platforms underpin Institutional Repositories
• Each needs its own Tracker Protocol implementation
• Out-of-the-box standard implementations:
• DSpace, Eprints, Figshare, Haplo, Fedora-Samvera (on-the-fly, as usage occurs)
• Worktribe (batch data, previous day’s usage)
• Out-of-the-box 50% standard implementation:
• Elsevier Pure (batch data, previous day’s usage)
• Only sends data about file downloads NOT metadata views
• Bespoke standard implementations:
• CORE, Equella, Other (on-the-fly, as usage occurs)
• Esploro, Fedora-Other (batch data, previous day’s usage)
• See https://irus.jisc.ac.uk/r5/participate/implement/
6 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Processing log file usage data
• Takes place every day at 3:30am
• A scheduled task processes data in the previous day’s log files
• To put it simply:
• Gets rid of ‘rubbish’ usage data it finds in the logs
• Puts eligible usage event data into a Tracker Data table for further
processing
• It’s easier to describe more fully in a diagram . . .
7 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Daily Tracker Log Processing – scheduled process at 3:30am each day
Tracker data
- on the fly
199 repositories
Daily log
files
Tracker data
- daily batch
Processing History table
Trackers table
Repositories table
Server Authority table
Blacklisted servers table
Tracker Log Processing Script
COUNTER Robot Exclusions
Fake referrers
Malformed messages
Blacklisted servers
Messages from unknown
repositories
Unregistered
Tracker Data table
Eligible messages from
registered repositories
Monthly
Tracker Data table
Summary reports
8 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Processing Tracker Data table usage events - Daily
• A scheduled task processes data in current month’s Tracker Data table
• Task consists of a ‘controller’ script that runs a dozen other scripts, which
between them:
• Identify and eliminate usage that falls foul of IRUS exclusions*
• Harvest bibliographic metadata for items that IRUS hasn’t encountered before
• Utilises standard OAI-PMH and APIs
• Includes assigning an IRUS Item Type based on source item types exposed in metadata*
• Collect and validate ORCiDs in item metadata to populate Author Authority tables*
• Perform COUNTER R5 processing that converts usage data to Daily statistics
• See how your data has been processed in the Processing statistics report
• Time for another diagram . . .
* See later slides
9 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Daily Tracker Data Processing – scheduled process at 6:00am every day
Processing history table
Monthly Tracker Data table
Usage events that occurred two
days ago
IRUS Item Types Mapping
Rules tables
Author Authority
Candidates table
Tracker Data Processing Script
Data processing
IRUS Daily Exclusions
Summary reports
Metadata processing Item Metadata Table
Harvest metadata - OAI-PMH
Harvest OAPEN metadata - OAI-PMH
Harvest CORE metadata - API
Harvest Vivli metadata - API
Harvest Pure dataset metadata - API
Process author authority candidates
Author Authority Table
Author Authority Item
Lookup Table
Daily statistics processing
Daily eligible COUNTER data processing
Daily statistics creation
Daily Statistics Tables
Provisional statistics
10 IRUS: from counting clicks to COUNTER stats - 20 September 2022
IRUS exclusions – robot and rogue usage
• Use of the COUNTER User Agent Exclusion List
• Is the minimum COUNTER requirement for robot detection
• Works reasonably well for traditional scholarly publishers behind pay barriers
• But it’s not enough in the open access world
• Besides ‘good’ bots like Googlebot, there are
• ‘bad’ bots that don’t declare themselves as bots but are mostly harmless
• and a host of others: hackers, spammers, dictionary attackers, etc.
• In addition, based on extensive analysis of our logs, we also eliminate usage from
• IPs with 40 or more downloads in a single day
• IP/UAs with 10 or more downloads of a single item in a single day
• IP ranges grouped by the 1st three octets that have 300 or more downloads in a day
• During an audit review, the COUNTER auditors agreed that these are reasonable
extra measures to remove robotic/rogue activity from our statistics
11 IRUS: from counting clicks to COUNTER stats - 20 September 2022
IRUS & Item Types
• When we harvest item metadata from repositories, one of the fields we
capture is the dc:type field
• Describes the nature or genre of the item - article, book, thesis, etc.
• It does not describe the Subject or Format of the item
• A lack of standardisation in the use of item types when looking across
repositories
• We encounter literally thousands of terms in dc:type
• Default lists of item types provided by software platform
• Lists of item types developed by individual institutions
• Controlled vocabularies, including COAR Resource Types
• Terms that are nothing to do with ‘type’
• This isn’t very useful and is a barrier to comparability
• Hence we need an appropriate, meaningful and useful item types across the
whole of IRUS
12 IRUS: from counting clicks to COUNTER stats - 20 September 2022
IRUS Item Types Mappings
• The original set of IRUS item types was defined in 2012
• Revisited and revised a number of times
• We used a manual mapping process, which had become unsustainable
• The current set of IRUS item types was defined in July 2022
• Based on analysis of over 4 million item records
• We expanded and enhanced the list, which consists of 31 IRUS item types
• We now use an automated, programmatic solution mapping to those IRUS types
• 40+ rules derived from analysis of over 4 million item records
• For more information, see the IRUS
• Item types and mapping policy
• Item type mappings report
13 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Author Authority - ORCiDS
• When we harvest item metadata, we scan for strings that look like ORCiDs
• These are added to the Authority Candidates table
• A subsequent script processes each ORCiD candidate
• If the ORCiD isn’t already in our system
• We put out a call to the orcid.org API to validate and verify the existence of the
ORCiD, and retrieve canonical author information
• If the ORCiD is found, we update the Author Authority and Item lookup tables
• If not, the ORCiD is discarded
• If the ORCiD is already known to our system
• We just update the Item lookup table to create an association between the ORCiD
and its item
14 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Processing Tracker Data table usage events - Monthly
• A set of 24 tasks process data in the previous month’s Tracker_Data table
• e.g. on 3rd September 2022 we produced the stats for August 2022
• The tasks fall (broadly) into four categories
• Data analysis
• Building up a picture of ‘user’ activity over time
• Future improvements in robot and rogue usage detection
• Data processing
• Reprocessing IRUS exclusions across the month
• Metadata processing
• Reprocessing metadata harvesting across the month
• Monthly Statistics Processing
• Producing COUNTER conformant monthly statistics
• Time for another diagram . . .
15 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Monthly Tracker Data Processing – (will be scheduled to) run on the 3rd
of each month
Processing History table
Monthly Tracker Data table
Item Metadata Table
Author Authority
Candidates table
Tracker Data Processing Script
Summary reports
Data analysis
IP address/User Agent activity
IP address/User Agent distribution
IP/UA activity tables
Data processing
IRUS Exclusions
Metadata processing
Harvest metadata – OAI-PMH & APIs
Harvest metadata – RIOXX
Process author authority candidates
Author Authority Table
Author Authority Item
Lookup Table
Monthly statistics processing
Eligible COUNTER data processing
Monthly statistics creation
Monthly Statistics Tables
IRUS PR & IR
OAPEN PR & IR
CORE PR
16 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Metadata Curation
• Historically, we’ve only harvested metadata for an item when first
encountered
• We’d only update metadata where we knew it was necessary
• However, it’s become increasingly apparent that we should regularly
refresh our metadata records
• There are frequent changes to repository records – (un)deletions,
corrections, enhancements . . .
• We’re currently updating all item metadata following the move to
automated and updated item type mapping
• We’re implementing regular incremental harvesting to pick up
metadata changes in repository records
17 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Data Curation
• Daily statistics tables get very big, very quickly
• Performance and storage issues
• We only keep statistics for the current month and the previous two months
• Older daily statistics are deleted on a monthly basis
• We’re very mindful of GDPR requirements!
• Usage data we gather includes IP addresses
• We store that data securely – only as long as we need it
• COUNTER rules require us to keep raw usage data for the current year
plus the previous two years
• Each year we delete old log files and old records from our database, which
are no longer required
18 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Exposing statistics – IRUS Custom API
• Once the statistics are in the database we need to expose them
• We have a number of API methods to retrieve
• Daily statistics
• Item level
• Available for current month + two previous months
• Monthly statistics
• Item level and Platform level
• Available from the time we started collecting statistics for any given repository
• Formats: JSON, and tabular – CSV/TSV
• Openly available to participants and other third parties
19 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Exposing statistics – example API call
https://irus.jisc.ac.uk/api/v3/irus/reports/[report_id]/?
requestor_id=[institutional Requestor_ID]&
begin_date=[YYYY-MM | YYYY-MM-DD]&
end_date=[YYYY-MM | YYYY-MM-DD]
{& optional parameters, e.g. platform, item_id, metric_type, content_type}
Many example calls on https://irus.jisc.ac.uk/r5/embed/api/
20 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Exposing statistics – using the API
API
Excel
(CSV)
Website
(IRUS)
Website
(via widget)
21 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Exposing statistics – widget example
More information at https://irus.jisc.ac.uk/r5/embed/widget/
22 IRUS: from counting clicks to COUNTER stats - 20 September 2022
What’s happening now and next?
In progress
• Metadata refresh
• Repository size and scale
information
• Backend reporting and
monitoring
Planned
• COUNTER Release 5.1
• COUNTER Compliance Audit
• R4 stats in the Individual
Item Report
Considering
• CORE and repository usage
• Journal information
• Funder information
• Search
• Request reports by email
• Regular reports to your inbox
• Visualisations
23 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Questions
24 IRUS: from counting clicks to COUNTER stats - 20 September 2022
Contact us
Email help@jisc.ac.uk
Mention IRUS in the subject line

Weitere ähnliche Inhalte

Ähnlich wie IRUS from counting clicks to COUNTER stats

Monitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-ApplicationsMonitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-Applications
Satya Sanjibani Routray
 

Ähnlich wie IRUS from counting clicks to COUNTER stats (20)

IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
IRUS: how do I
IRUS: how do I IRUS: how do I
IRUS: how do I
 
Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Pirus December 2011
Pirus December 2011Pirus December 2011
Pirus December 2011
 
Business analytics and data visualisation
Business analytics and data visualisationBusiness analytics and data visualisation
Business analytics and data visualisation
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
 
IRUS-UK at Repository Fringe 2014
IRUS-UK at Repository Fringe 2014IRUS-UK at Repository Fringe 2014
IRUS-UK at Repository Fringe 2014
 
IRUS-UK: Does anyone use the material in your repository?
IRUS-UK: Does anyone use the material in your repository?IRUS-UK: Does anyone use the material in your repository?
IRUS-UK: Does anyone use the material in your repository?
 
Monitoring docker containers and dockerized applications
Monitoring docker containers and dockerized applicationsMonitoring docker containers and dockerized applications
Monitoring docker containers and dockerized applications
 
Monitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-ApplicationsMonitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-Applications
 
Monitoring docker container and dockerized applications
Monitoring docker container and dockerized applicationsMonitoring docker container and dockerized applications
Monitoring docker container and dockerized applications
 
Monitoring docker-container-and-dockerized-applications
Monitoring docker-container-and-dockerized-applicationsMonitoring docker-container-and-dockerized-applications
Monitoring docker-container-and-dockerized-applications
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Shepherd pirus april 2013
Shepherd pirus april 2013Shepherd pirus april 2013
Shepherd pirus april 2013
 
Data in Motion - tech-intro-for-paris-hackathon
Data in Motion - tech-intro-for-paris-hackathonData in Motion - tech-intro-for-paris-hackathon
Data in Motion - tech-intro-for-paris-hackathon
 
How to leverage Enterprise Architecture in a regulated environment
How to leverage Enterprise Architecture in a regulated environmentHow to leverage Enterprise Architecture in a regulated environment
How to leverage Enterprise Architecture in a regulated environment
 
Microstrategy Overview
Microstrategy OverviewMicrostrategy Overview
Microstrategy Overview
 

Mehr von Jisc

Mehr von Jisc (20)

Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...
 
Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptx
 
Open Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxOpen Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptx
 
Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...
 
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
 
Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023
 
Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023
 
Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023
 
JISC Presentation.pptx
JISC Presentation.pptxJISC Presentation.pptx
JISC Presentation.pptx
 
Community-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxCommunity-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptx
 
The Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptxThe Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptx
 
Are we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptx
 
JiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptxJiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptx
 
UWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptxUWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptx
 
An introduction to Cyber Essentials
An introduction to Cyber EssentialsAn introduction to Cyber Essentials
An introduction to Cyber Essentials
 

Kürzlich hochgeladen

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Kürzlich hochgeladen (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

IRUS from counting clicks to COUNTER stats

  • 1. IRUS: from counting clicks to COUNTER stats 20 September 2022
  • 2. What we will cover • IRUS context and overview • How does it work? • Usage data • Collecting • Handling • Processing • Storing • Exposing statistics using the API and examples • What is next? • Q&A 2 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 3. IRUS context IRUS: Open and flexible access to comparable and standardised usage statistics for repositories • Based on COUNTER Code of Practice, international standard for measuring usage of e-resources • 199 active participating repositories across 159 organisations • Over 17 million individual items • Between 2M and 6M usage events received daily IRUS IRUS-UK IRUS-CORE IRUS-ANZ IRUS-US IRUS-OAPEN 3 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 4. High-level overview Collect raw usage data • Repositories send logs via tracker protocol Process into COUNTER stats • Filter out robots and rogue usage and double-clicks • Add metadata Enrich with additional information • ORCIDs • IRUS item types Expose • API based on COUNTER SUSHI standard Present and export • Web reporting interface • Widget Curate the data 4 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 5. How we collect usage data – the Tracker Protocol • We need a standard approach to collect raw usage data when repository pages are viewed and full content downloaded • The Tracker Protocol • Devised in collaboration with COUNTER • A user* clicks on a link to an item page (i.e. views item metadata) or an associated file (i.e. requests a download) • An OpenURL-like log entry – a “tracker message” - is sent to a URL endpoint on the IRUS server for further processing • Tracker messages are stored in daily** log files • The Tracker Protocol specification for COUNTER R5 conformance * The ‘user’ could be a human or a machine ** The date messages are received, which isn’t necessarily the same as the date a usage event happened 5 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 6. Tracker Protocol Implementations • Various software platforms underpin Institutional Repositories • Each needs its own Tracker Protocol implementation • Out-of-the-box standard implementations: • DSpace, Eprints, Figshare, Haplo, Fedora-Samvera (on-the-fly, as usage occurs) • Worktribe (batch data, previous day’s usage) • Out-of-the-box 50% standard implementation: • Elsevier Pure (batch data, previous day’s usage) • Only sends data about file downloads NOT metadata views • Bespoke standard implementations: • CORE, Equella, Other (on-the-fly, as usage occurs) • Esploro, Fedora-Other (batch data, previous day’s usage) • See https://irus.jisc.ac.uk/r5/participate/implement/ 6 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 7. Processing log file usage data • Takes place every day at 3:30am • A scheduled task processes data in the previous day’s log files • To put it simply: • Gets rid of ‘rubbish’ usage data it finds in the logs • Puts eligible usage event data into a Tracker Data table for further processing • It’s easier to describe more fully in a diagram . . . 7 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 8. Daily Tracker Log Processing – scheduled process at 3:30am each day Tracker data - on the fly 199 repositories Daily log files Tracker data - daily batch Processing History table Trackers table Repositories table Server Authority table Blacklisted servers table Tracker Log Processing Script COUNTER Robot Exclusions Fake referrers Malformed messages Blacklisted servers Messages from unknown repositories Unregistered Tracker Data table Eligible messages from registered repositories Monthly Tracker Data table Summary reports 8 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 9. Processing Tracker Data table usage events - Daily • A scheduled task processes data in current month’s Tracker Data table • Task consists of a ‘controller’ script that runs a dozen other scripts, which between them: • Identify and eliminate usage that falls foul of IRUS exclusions* • Harvest bibliographic metadata for items that IRUS hasn’t encountered before • Utilises standard OAI-PMH and APIs • Includes assigning an IRUS Item Type based on source item types exposed in metadata* • Collect and validate ORCiDs in item metadata to populate Author Authority tables* • Perform COUNTER R5 processing that converts usage data to Daily statistics • See how your data has been processed in the Processing statistics report • Time for another diagram . . . * See later slides 9 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 10. Daily Tracker Data Processing – scheduled process at 6:00am every day Processing history table Monthly Tracker Data table Usage events that occurred two days ago IRUS Item Types Mapping Rules tables Author Authority Candidates table Tracker Data Processing Script Data processing IRUS Daily Exclusions Summary reports Metadata processing Item Metadata Table Harvest metadata - OAI-PMH Harvest OAPEN metadata - OAI-PMH Harvest CORE metadata - API Harvest Vivli metadata - API Harvest Pure dataset metadata - API Process author authority candidates Author Authority Table Author Authority Item Lookup Table Daily statistics processing Daily eligible COUNTER data processing Daily statistics creation Daily Statistics Tables Provisional statistics 10 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 11. IRUS exclusions – robot and rogue usage • Use of the COUNTER User Agent Exclusion List • Is the minimum COUNTER requirement for robot detection • Works reasonably well for traditional scholarly publishers behind pay barriers • But it’s not enough in the open access world • Besides ‘good’ bots like Googlebot, there are • ‘bad’ bots that don’t declare themselves as bots but are mostly harmless • and a host of others: hackers, spammers, dictionary attackers, etc. • In addition, based on extensive analysis of our logs, we also eliminate usage from • IPs with 40 or more downloads in a single day • IP/UAs with 10 or more downloads of a single item in a single day • IP ranges grouped by the 1st three octets that have 300 or more downloads in a day • During an audit review, the COUNTER auditors agreed that these are reasonable extra measures to remove robotic/rogue activity from our statistics 11 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 12. IRUS & Item Types • When we harvest item metadata from repositories, one of the fields we capture is the dc:type field • Describes the nature or genre of the item - article, book, thesis, etc. • It does not describe the Subject or Format of the item • A lack of standardisation in the use of item types when looking across repositories • We encounter literally thousands of terms in dc:type • Default lists of item types provided by software platform • Lists of item types developed by individual institutions • Controlled vocabularies, including COAR Resource Types • Terms that are nothing to do with ‘type’ • This isn’t very useful and is a barrier to comparability • Hence we need an appropriate, meaningful and useful item types across the whole of IRUS 12 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 13. IRUS Item Types Mappings • The original set of IRUS item types was defined in 2012 • Revisited and revised a number of times • We used a manual mapping process, which had become unsustainable • The current set of IRUS item types was defined in July 2022 • Based on analysis of over 4 million item records • We expanded and enhanced the list, which consists of 31 IRUS item types • We now use an automated, programmatic solution mapping to those IRUS types • 40+ rules derived from analysis of over 4 million item records • For more information, see the IRUS • Item types and mapping policy • Item type mappings report 13 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 14. Author Authority - ORCiDS • When we harvest item metadata, we scan for strings that look like ORCiDs • These are added to the Authority Candidates table • A subsequent script processes each ORCiD candidate • If the ORCiD isn’t already in our system • We put out a call to the orcid.org API to validate and verify the existence of the ORCiD, and retrieve canonical author information • If the ORCiD is found, we update the Author Authority and Item lookup tables • If not, the ORCiD is discarded • If the ORCiD is already known to our system • We just update the Item lookup table to create an association between the ORCiD and its item 14 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 15. Processing Tracker Data table usage events - Monthly • A set of 24 tasks process data in the previous month’s Tracker_Data table • e.g. on 3rd September 2022 we produced the stats for August 2022 • The tasks fall (broadly) into four categories • Data analysis • Building up a picture of ‘user’ activity over time • Future improvements in robot and rogue usage detection • Data processing • Reprocessing IRUS exclusions across the month • Metadata processing • Reprocessing metadata harvesting across the month • Monthly Statistics Processing • Producing COUNTER conformant monthly statistics • Time for another diagram . . . 15 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 16. Monthly Tracker Data Processing – (will be scheduled to) run on the 3rd of each month Processing History table Monthly Tracker Data table Item Metadata Table Author Authority Candidates table Tracker Data Processing Script Summary reports Data analysis IP address/User Agent activity IP address/User Agent distribution IP/UA activity tables Data processing IRUS Exclusions Metadata processing Harvest metadata – OAI-PMH & APIs Harvest metadata – RIOXX Process author authority candidates Author Authority Table Author Authority Item Lookup Table Monthly statistics processing Eligible COUNTER data processing Monthly statistics creation Monthly Statistics Tables IRUS PR & IR OAPEN PR & IR CORE PR 16 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 17. Metadata Curation • Historically, we’ve only harvested metadata for an item when first encountered • We’d only update metadata where we knew it was necessary • However, it’s become increasingly apparent that we should regularly refresh our metadata records • There are frequent changes to repository records – (un)deletions, corrections, enhancements . . . • We’re currently updating all item metadata following the move to automated and updated item type mapping • We’re implementing regular incremental harvesting to pick up metadata changes in repository records 17 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 18. Data Curation • Daily statistics tables get very big, very quickly • Performance and storage issues • We only keep statistics for the current month and the previous two months • Older daily statistics are deleted on a monthly basis • We’re very mindful of GDPR requirements! • Usage data we gather includes IP addresses • We store that data securely – only as long as we need it • COUNTER rules require us to keep raw usage data for the current year plus the previous two years • Each year we delete old log files and old records from our database, which are no longer required 18 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 19. Exposing statistics – IRUS Custom API • Once the statistics are in the database we need to expose them • We have a number of API methods to retrieve • Daily statistics • Item level • Available for current month + two previous months • Monthly statistics • Item level and Platform level • Available from the time we started collecting statistics for any given repository • Formats: JSON, and tabular – CSV/TSV • Openly available to participants and other third parties 19 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 20. Exposing statistics – example API call https://irus.jisc.ac.uk/api/v3/irus/reports/[report_id]/? requestor_id=[institutional Requestor_ID]& begin_date=[YYYY-MM | YYYY-MM-DD]& end_date=[YYYY-MM | YYYY-MM-DD] {& optional parameters, e.g. platform, item_id, metric_type, content_type} Many example calls on https://irus.jisc.ac.uk/r5/embed/api/ 20 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 21. Exposing statistics – using the API API Excel (CSV) Website (IRUS) Website (via widget) 21 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 22. Exposing statistics – widget example More information at https://irus.jisc.ac.uk/r5/embed/widget/ 22 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 23. What’s happening now and next? In progress • Metadata refresh • Repository size and scale information • Backend reporting and monitoring Planned • COUNTER Release 5.1 • COUNTER Compliance Audit • R4 stats in the Individual Item Report Considering • CORE and repository usage • Journal information • Funder information • Search • Request reports by email • Regular reports to your inbox • Visualisations 23 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 24. Questions 24 IRUS: from counting clicks to COUNTER stats - 20 September 2022
  • 25. Contact us Email help@jisc.ac.uk Mention IRUS in the subject line