TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
RJ Broker: Automating Delivery of Research Output to Repositories
1. RJ Broker:
Automating Delivery of Research
Output to Repositories
Muriel Mewissen – EDINA
RSP Event - London - 12 June 20131
2. Overview
• Need for a broker
• Development of the RJ Broker
• Publisher & Subject Repository Trials
• Future
• Conclusion
RSP Event - London - 12 June 20132
3. The need for a Broker
RSP Event - London - 12 June 20133
4. Focus
• To increase the number of deposits to UK repositories
• To minimise effort by depositors and IR managers
Support Open Access & Funder mandates
Institutional Repo
Subject repo
Publishers
Author 2 /
Representative
Funders
Publisher System
IR
IR
IR
Author/PI
RSP Event - London - 12 June 20134
Broker
6. RJ Broker Project
• Repository based projects at EDINA since 2006
http://edina.ac.uk/projects/
• RJ Broker
– April 2012 to March 2013, extended to July 2013
– Component of the RepNet infrastructure
• RJ Broker is a delivery service for research
output
– Deposit: parcel or letter
– Metadata: address
– Notification: postcard
– Vision:http://oarepojunction.wordpress.com/2013/01
/10/rj-broker-a-research-output-delivery-service/
RSP Event - London - 12 June 20136
7. RJ Broker Middleware Tool
RSP Event - London - 12 June 20137
1. Accept deposit of research articles
NLM DTD, bespoke format, SWORD, Eprints
2. Process the deposits into a common format
RJ Broker code
3. Identify target repositories from metadata
Organisation and Repository Identification (ORI)
http://ori.edina.ac.uk/
4. Handle deposition to registered repositories
SWORD, plugins (Eprints, DSpace, Fedora)
5. Provide tracking ID to content supplier
URIs
6. Notify other repositories with relevant content
Monthly email
7. Allow browsing, search and download
GUI & APIs (Eprints)
“View” is useful for non SWORD systems (CRIS) & individuals
1
2
3
4
5
6
7
8. Publisher & Subject Repository
Trials with the Pilot RJ Broker
RSP Event - London - 12 June 20138
9. Pilot RJ Broker
• Demonstrate the functionality
• Real data
• Test the scalability
• Publisher: Nature Publishing Group (NPG)
• Subject Repository: Europe PubMed Central
RSP Event - London - 12 June 20139
10. Publisher: NPG
• Record includes
– Metadata: rich, embargo, funder, multiple authors,
ORCID in the future…
– Content: Multi-part publication (some content may be
embargoed) full text author final copy (post-print)
• Development work:
– Agree format for the record (NLM DTD based)
– EDINA developed an importer for the data
– Transfer using SWORD 1.3
– NPG added new stream in their publication workflow
that send data to the RJ Broker
RSP Event - London - 12 June 201310
11. NPG
• Legal agreements to respect embargo periods
– Between NPG & EDINA
– Between EDINA & IRs
• MIT signed the IR agreement
– Working on data importer for DSpace
• Worth considering to receive:
– Quality: Full text publication & rich metadata
– Timely: Straight from the publisher during the
publication process even if embargoed
• Template agreement on request
RSP Event - London - 12 June 201311
12. NPG
• Set up took several months
– Time difference
– Relies on voluntary participation
– Requires small amount of development work
– Legal framework
• Successful data transfer trial between NPG
& RJ Broker in February 2013
• Transfer to test IRs
• NPG ready to start continuous data feed
– A couple of journals first to increase with take up
RSP Event - London - 12 June 201312
13. Subject Repository: Europe PMC
• Use case supported by Jisc, RepNet & Wellcome
Trust
– UK focus
– Support funders mandate
• Record includes:
– New publication or Update to existing publication
– Metadata only: funders, grant numbers, first author
only, DOI to full text…
– No restrictions to redistribution
RSP Event - London - 12 June 201313
14. Europe PMC
• Development work:
– Agree format for the record (bespoke)
– EDINA developed an importer for the data
– MIMAS/EBI get regular data feed from PMC
– Push data from their regular feed to the RJ Broker
– Transfer using SWORD 1.3
• Set up took a few weeks
• Successful data transfer trial between
Europe PMC & RJ Broker in February 2013
• Transfer to test IRs
• Ready to start continuous data feed
– Average 160,000 records per month
RSP Event - London - 12 June 201314
15. Europe PMC Trial in Numbers
~67,000
~60,000
~58,500
~22,500
~14,500
1,665
RSP Event - London - 12 June 201315
67,000 records in the trial dataset
(~12 days based on an average 160,000 per month)
7,000 no affiliation 60,000 sent to RJ
Broker
1,500 errors (bad format) 58,500 successfully
received by RJ Broker
36,000 with no identifiable
organisation
RJ Broker identifies
organisation for 22,500
8,000 no repositories 14,500 have
repositories
13,000 worldwide (not UK) 1,665 in the UK
17. Europe PMC Trial in Numbers
RSP Event - London - 12 June 201317
Number of
associated
repositories for
records with one
organisation
identified
18. Europe PMC Trial in Numbers
RSP Event - London - 12 June 201318
Country
Code
Country Number of
records
us USA 5934
gb United
Kingdom
1665
ca Canada 1099
jp Japan 722
au Australia 655
se Sweden 313
es Spain 304
nl Netherlands 299
de Germany 239
tw Taiwan 181
fr France 180
br Brazil 179
it Italy 176
be Belgium 174
th Thailand 168
za South Africa 160
sd Sudan 155
55 other countries with
less than 1% of records
each
1836
19. Top UK Institutions Destination Number of
records
University of Oxford 170
University of Cambridge 139
University College London 119
Imperial College 103
University of Edinburgh 88
University of Manchester 63
University of Bristol 61
University of Nottingham, University of Newcastle Upon Tyne 56
Liverpool 55
University of Glasgow 52
RSP Event - London - 12 June 201319
Europe PMC Trial in Numbers
78 UK Institutions in total
20. RJ Broker Trial Installation
RSP Event - London - 12 June 201320
GUI preview accessOA records from
trials are available
for browsing &
downloading
– Check what we
have for your
institution!
– http://devel.edina.
ac.uk:1203/
– !!! It is only trial &
development
installation
– !!! Not a service yet
21. RJ Broker Trial
Demonstrate features:
• Importing records from different suppliers
• Storing & Processing (~2s per record)
• Repository Identification
• Delivery
• Browsing & Download
More end-to-end use cases with external IRs
RSP Event - London - 12 June 201321
23. Immediate Future
• Project extension (31 July 2013)
• Prepare transition to service
– Service installation
– Add functionality
• Email notification to all (non-registered) IRs
• Improve support for different repository platforms
• Bulk transfer of data backlog
• Support RIOXX metadata export
– Early adopters
• IRs
• Data suppliers to establish data feeds
– Start building data store
• Content kept for 1 year to start with
RSP Event - London - 12 June 201323
24. Future (after July 2013)
• Transition to Service
– SLD (RepNet/Jisc)
– Roadmap for adding further functionality
• Open for recruitment
– Info „pack‟, template, sandbox, help & support
– IR Registration process:
• provides SWORD endpoint credentials to RJ Broker
• IR is configured to accept RJ Broker data
• Option to opt-in to receive embargo content requires to sign a
legal agreement
– Data supplier Registration process:
• RJ Broker to provide SWORD access
• Agree format & develop importer
• Enable regular data feed into RJ Broker
RSP Event - London - 12 June 201324
26. Conclusion
• Effective solution to content dissemination
• Benefits all
– Increase deposit to IRs
– Support OA (Gold & Green)
– Help with reporting
– Support promotion of research output
– Saves time & effort (money)
• Appeal of service will grow
• Small amount of development work needed
locally but it is worth it!
RSP Event - London - 12 June 201326
27. Thanks
• You
• RSP
• EDINA Team
– Ian Stuart - Cesare Bellini
– Muriel Mewissen - Christine Rees
– Peter Burnhill - Theo Andrews
• NPG, MIT, Europe PMC, MIMAS, EBI, Wellcome
Trust
• UK RepositoryNet+
• Jisc
RSP Event - London - 12 June 201327