SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Leabharlann UCD
An Coláiste Ollscoile, Baile
Átha Cliath,
Belfield, Baile Átha Cliath 4,
Eire
UCD Library
University College Dublin,
Belfield, Dublin 4, Ireland
Robot hunter
Or, precisely what I thought I wouldn’t
be doing when I became a librarian
Joseph Greene
Research Repository Librarian
joseph.greene@ucd.ie
http://researchrepository.ucd.ie
Counting downloads
• Open Access repositories make science and
scholarship accessible, and we need to
demonstrate our value
• Simple question: how often are these papers
used? How many times have they been
downloaded?
Enter the Robot
• At least 18% of web requests are from robots
• Less than half can be accounted for by the five
main search engines
• At Research Repository UCD, 2/3rds of our
repository’s downloads are marked as web robots
What are you talking about?
Internet robot, Web robot, automated agent,
crawler, spider, bot: any programme that visits
websites and systematically retrieves information
from them
Good and bad
• Search engines, link verifiers, computer science
experiments
• Gathering content for spam, phishing and copycat
sites, artificially improving a website’s ranking
(spamdexing), looking for security holes, DDoS
attacks…………
‘And the noisy, nasty nuisance grew, ‘til
the villagers cried, “What can we do?”’
Detection methods:
• Blocking robots in real-time:
Turing tests
• Detecting later and removing
from statistics
Appropriate, but problematic methods
for repositories
• Excluding known robots by user-agent name
– Easily faked or omitted
• Excluding by IP address
– DHCP, and list is growing exponentially
• Usage pattern analysis: query rate and resources
requested
– Expensive to automate
• Machine learning: training decision trees, neural
nets and/or statistical systems
– Did you say expensive???
• Combined approaches
Effectiveness, and repository out-of-the-
box repository strategies
Strength
Robots detected by Recall (%) Precision (%)
No images requested 98.34 75.48
No referring site 96.27 52.25
List of IP addresses 69.29 99.40
HEAD method to access site 32.37 100.00
Agent name declared 26.56 100.00
Access only at night 24.48 50.43
Robots.txt file accessed 17.01 100.00
Time, σ (3s) 2.49 100.00
Time, average (1s) 2.49 75.00
DSpace uses IP addresses of
known agents – much weaker than
in the benchmarking study
Effectiveness, and repository out-of-the-
box repository strategies
Strength
Robots detected by Recall (%) Precision (%)
No images requested 98.34 75.48
No referring site 96.27 52.25
List of IP addresses 69.29 99.40
HEAD method to access site 32.37 100.00
Agent name declared 26.56 100.00
Access only at night 24.48 50.43
Robots.txt file accessed 17.01 100.00
Time, σ (3s) 2.49 100.00
Time, average (1s) 2.49 75.00
Eprints filters based on number of
hits from an IP address per day –
similar to time based strategies in
the benchmarking study
Effectiveness, and repository out-of-the-
box repository strategies
Strength
Robots detected by Recall (%) Precision (%)
No images requested 98.34 75.48
No referring site 96.27 52.25
List of IP addresses 69.29 99.40
HEAD method to access site 32.37 100.00
Agent name declared 26.56 100.00
Access only at night 24.48 50.43
Robots.txt file accessed 17.01 100.00
Time, σ (3s) 2.49 100.00
Time, average (1s) 2.49 75.00
Centralised strategy: IRUS-UK
• Collects and filters statistics from 84 DSpace and
Eprints repositories
• COUNTER compliant usage statistics
• Robot exclusion:
– The COUNTER list of agent names
– All downloads from IP addresses where there are
more than 200 downloads in a day from a
repository
– Most downloads from IP addresses where there are
more than 100 downloads in a day from a
repository
• Work commissioned to investigate feasibility and
approach to adaptive filtering based on usage
behaviour
Sources by slide
1 Bill Gosper's Glider Gun in action—a variation of Conway's Game of Life. Johan G.
Bontes.
<https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#/media/File:Gosper
s_glider_gun.gif>
3, 6, 7 Doran, D.; Gokhale, S.S. Web robot detection techniques: overview and
limitations. Data Mining and Knowledge Discovery (2011) 22:183-210.
DOI:10.1007/s10618-010-0180-z
4 http://pixabay.com/static/uploads/photo/2015/05/31/12/09/wooden-
791421_640.jpg
5 Bad Robot Productions logo. 2001-2008.
<https://en.wikipedia.org/wiki/Bad_Robot_Productions#/media/File:Bad_Robot_
Productions_logo.jpg>
6 Burroway, J., Loard, J. V. The Giant Jam Sandwich. 1972, Houghton Mifflin Harcourt.
8, 9, 10 Nick Geens, Johan Huysmans, Jan Vanthienen. Evaluation of Web Robot
Discovery Techniques: A Benchmarking Study. Advances in Data Mining.
Applications in Medicine, Web Mining, Marketing, Image and Signal Mining.
Lecture Notes in Computer Science 4065, pp 121-130, 2006.
DOI:10.1007/11790853_10
8 Diggory, Mark. SOLR Statistics. DSpace Wiki.
<https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics>
9 Joint, Nicholas. [EP-tech] Re: Please change the way IRstats works. Eprints_tech
mailing list 2011-10-13 <http://www.eprints.org/tech.php/15695.html>
11 IRUS-UK. <http://www.irus.mimas.ac.uk/participants/>
Thank you!

Weitere ähnliche Inhalte

Andere mochten auch

bradail-irish-plagiarism-tutorial-ucd-library
bradail-irish-plagiarism-tutorial-ucd-librarybradail-irish-plagiarism-tutorial-ucd-library
bradail-irish-plagiarism-tutorial-ucd-libraryUCD Library
 
What Is LibGuides?
What Is LibGuides?What Is LibGuides?
What Is LibGuides?UCD Library
 
The production process and promotion of video in UCD Library, integrating Web...
The production process and promotion of video in UCD Library, integrating Web...The production process and promotion of video in UCD Library, integrating Web...
The production process and promotion of video in UCD Library, integrating Web...UCD Library
 
EU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over EuropeEU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over EuropeMarc Garriga
 
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...Creating spaces for learning : designing the UCD Health Sciences Library. Aut...
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...UCD Library
 
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...UCD Library
 
Reassembling a Forgotten Library: The Library of the Royal College of Science...
Reassembling a Forgotten Library: The Library of the Royal College of Science...Reassembling a Forgotten Library: The Library of the Royal College of Science...
Reassembling a Forgotten Library: The Library of the Royal College of Science...UCD Library
 
UKASFP Conference 2009
UKASFP Conference 2009UKASFP Conference 2009
UKASFP Conference 2009carl plant
 
Presentation #1ODataLicenseEU. LAPSI Seminar, Budapest
Presentation #1ODataLicenseEU. LAPSI Seminar, BudapestPresentation #1ODataLicenseEU. LAPSI Seminar, Budapest
Presentation #1ODataLicenseEU. LAPSI Seminar, BudapestMarc Garriga
 
Seeing through learners' eyes
Seeing through learners' eyesSeeing through learners' eyes
Seeing through learners' eyesUCD Library
 
Teaching support : different perspectives, shared challenges. Authors: Ursula...
Teaching support : different perspectives, shared challenges. Authors: Ursula...Teaching support : different perspectives, shared challenges. Authors: Ursula...
Teaching support : different perspectives, shared challenges. Authors: Ursula...UCD Library
 
Week 2 Uf 5163
Week 2 Uf 5163Week 2 Uf 5163
Week 2 Uf 5163Mohd Yusak
 
Environmental Conflicts - Resolution Through Reframing
Environmental Conflicts - Resolution Through Reframing Environmental Conflicts - Resolution Through Reframing
Environmental Conflicts - Resolution Through Reframing Mark Szabo
 
Library Resource Discovery Service - Is Instructional Help Necessary?
Library Resource Discovery Service - Is Instructional Help Necessary?Library Resource Discovery Service - Is Instructional Help Necessary?
Library Resource Discovery Service - Is Instructional Help Necessary?UCD Library
 
Roger matisse
Roger matisseRoger matisse
Roger matisseIrisat
 
Data driven cities: Gestionar las ciudades a partir de los datos
Data driven cities: Gestionar las ciudades a partir de los datosData driven cities: Gestionar las ciudades a partir de los datos
Data driven cities: Gestionar las ciudades a partir de los datosMarc Garriga
 

Andere mochten auch (20)

bradail-irish-plagiarism-tutorial-ucd-library
bradail-irish-plagiarism-tutorial-ucd-librarybradail-irish-plagiarism-tutorial-ucd-library
bradail-irish-plagiarism-tutorial-ucd-library
 
What Is LibGuides?
What Is LibGuides?What Is LibGuides?
What Is LibGuides?
 
The production process and promotion of video in UCD Library, integrating Web...
The production process and promotion of video in UCD Library, integrating Web...The production process and promotion of video in UCD Library, integrating Web...
The production process and promotion of video in UCD Library, integrating Web...
 
EU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over EuropeEU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over Europe
 
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...Creating spaces for learning : designing the UCD Health Sciences Library. Aut...
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...
 
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
 
Paula
PaulaPaula
Paula
 
Reassembling a Forgotten Library: The Library of the Royal College of Science...
Reassembling a Forgotten Library: The Library of the Royal College of Science...Reassembling a Forgotten Library: The Library of the Royal College of Science...
Reassembling a Forgotten Library: The Library of the Royal College of Science...
 
Presentation2
Presentation2Presentation2
Presentation2
 
UKASFP Conference 2009
UKASFP Conference 2009UKASFP Conference 2009
UKASFP Conference 2009
 
Presentation #1ODataLicenseEU. LAPSI Seminar, Budapest
Presentation #1ODataLicenseEU. LAPSI Seminar, BudapestPresentation #1ODataLicenseEU. LAPSI Seminar, Budapest
Presentation #1ODataLicenseEU. LAPSI Seminar, Budapest
 
Seeing through learners' eyes
Seeing through learners' eyesSeeing through learners' eyes
Seeing through learners' eyes
 
Teaching support : different perspectives, shared challenges. Authors: Ursula...
Teaching support : different perspectives, shared challenges. Authors: Ursula...Teaching support : different perspectives, shared challenges. Authors: Ursula...
Teaching support : different perspectives, shared challenges. Authors: Ursula...
 
Presentation6
Presentation6Presentation6
Presentation6
 
Week 2 Uf 5163
Week 2 Uf 5163Week 2 Uf 5163
Week 2 Uf 5163
 
My 2 cents on Productivity
My 2 cents on ProductivityMy 2 cents on Productivity
My 2 cents on Productivity
 
Environmental Conflicts - Resolution Through Reframing
Environmental Conflicts - Resolution Through Reframing Environmental Conflicts - Resolution Through Reframing
Environmental Conflicts - Resolution Through Reframing
 
Library Resource Discovery Service - Is Instructional Help Necessary?
Library Resource Discovery Service - Is Instructional Help Necessary?Library Resource Discovery Service - Is Instructional Help Necessary?
Library Resource Discovery Service - Is Instructional Help Necessary?
 
Roger matisse
Roger matisseRoger matisse
Roger matisse
 
Data driven cities: Gestionar las ciudades a partir de los datos
Data driven cities: Gestionar las ciudades a partir de los datosData driven cities: Gestionar las ciudades a partir de los datos
Data driven cities: Gestionar las ciudades a partir de los datos
 

Ähnlich wie Robot Hunter: or precisely what I thought I wouldn't be doing when I became a librarian

Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesUCD Library
 
hacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptxhacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptxsconalbg
 
Discovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyDiscovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyHong (Jenny) Jing
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeEdward Baker
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2Spyglass Security
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016Danny Akacki
 
The Web Application Hackers Toolchain
The Web Application Hackers ToolchainThe Web Application Hackers Toolchain
The Web Application Hackers Toolchainjasonhaddix
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryIan Foster
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Alex Pinto
 
Dafgjgghhghfhjgghjhgy06-Footprinting.pptx
Dafgjgghhghfhjgghjhgy06-Footprinting.pptxDafgjgghhghfhjgghjhgy06-Footprinting.pptx
Dafgjgghhghfhjgghjhgy06-Footprinting.pptxAlfredObia1
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013Kirill Osipov
 
DEEPSEC 2013: Malware Datamining And Attribution
DEEPSEC 2013: Malware Datamining And AttributionDEEPSEC 2013: Malware Datamining And Attribution
DEEPSEC 2013: Malware Datamining And AttributionMichael Boman
 
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007Jason Hong
 
Chapter 2 for cyber security examination.pptx
Chapter 2 for cyber security examination.pptxChapter 2 for cyber security examination.pptx
Chapter 2 for cyber security examination.pptxMahdiHasanSowrav
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 

Ähnlich wie Robot Hunter: or precisely what I thought I wouldn't be doing when I became a librarian (20)

Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access Resources
 
hacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptxhacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptx
 
Discovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyDiscovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case Study
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science
 
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
 
The Web Application Hackers Toolchain
The Web Application Hackers ToolchainThe Web Application Hackers Toolchain
The Web Application Hackers Toolchain
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
Dafgjgghhghfhjgghjhgy06-Footprinting.pptx
Dafgjgghhghfhjgghjhgy06-Footprinting.pptxDafgjgghhghfhjgghjhgy06-Footprinting.pptx
Dafgjgghhghfhjgghjhgy06-Footprinting.pptx
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
DEEPSEC 2013: Malware Datamining And Attribution
DEEPSEC 2013: Malware Datamining And AttributionDEEPSEC 2013: Malware Datamining And Attribution
DEEPSEC 2013: Malware Datamining And Attribution
 
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
Chapter 2 for cyber security examination.pptx
Chapter 2 for cyber security examination.pptxChapter 2 for cyber security examination.pptx
Chapter 2 for cyber security examination.pptx
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 

Mehr von UCD Library

The role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityThe role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityUCD Library
 
Collection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryCollection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryUCD Library
 
The authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesThe authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesUCD Library
 
Show and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationShow and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationUCD Library
 
Print to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryPrint to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryUCD Library
 
Appearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersAppearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersUCD Library
 
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...UCD Library
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library
 
Going Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaGoing Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaUCD Library
 
Going Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaGoing Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaUCD Library
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewUCD Library
 
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Library
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryUCD Library
 
Creating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionCreating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionUCD Library
 
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...UCD Library
 
Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...UCD Library
 
UCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library
 
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...UCD Library
 
Pin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsPin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsUCD Library
 
Real Life Digital Curation and Preservation
Real Life Digital Curation and PreservationReal Life Digital Curation and Preservation
Real Life Digital Curation and PreservationUCD Library
 

Mehr von UCD Library (20)

The role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityThe role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrity
 
Collection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryCollection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD Library
 
The authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesThe authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA Humanities
 
Show and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationShow and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and education
 
Print to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryPrint to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital Library
 
Appearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersAppearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishers
 
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for Researchers
 
Going Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaGoing Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in China
 
Going Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaGoing Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in China
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an Overview
 
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital Library
 
Creating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionCreating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital Collection
 
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
 
Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...
 
UCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining Collections
 
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
 
Pin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsPin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locations
 
Real Life Digital Curation and Preservation
Real Life Digital Curation and PreservationReal Life Digital Curation and Preservation
Real Life Digital Curation and Preservation
 

Kürzlich hochgeladen

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Robot Hunter: or precisely what I thought I wouldn't be doing when I became a librarian

  • 1. Leabharlann UCD An Coláiste Ollscoile, Baile Átha Cliath, Belfield, Baile Átha Cliath 4, Eire UCD Library University College Dublin, Belfield, Dublin 4, Ireland Robot hunter Or, precisely what I thought I wouldn’t be doing when I became a librarian Joseph Greene Research Repository Librarian joseph.greene@ucd.ie http://researchrepository.ucd.ie
  • 2. Counting downloads • Open Access repositories make science and scholarship accessible, and we need to demonstrate our value • Simple question: how often are these papers used? How many times have they been downloaded?
  • 3. Enter the Robot • At least 18% of web requests are from robots • Less than half can be accounted for by the five main search engines • At Research Repository UCD, 2/3rds of our repository’s downloads are marked as web robots
  • 4. What are you talking about? Internet robot, Web robot, automated agent, crawler, spider, bot: any programme that visits websites and systematically retrieves information from them
  • 5. Good and bad • Search engines, link verifiers, computer science experiments • Gathering content for spam, phishing and copycat sites, artificially improving a website’s ranking (spamdexing), looking for security holes, DDoS attacks…………
  • 6. ‘And the noisy, nasty nuisance grew, ‘til the villagers cried, “What can we do?”’ Detection methods: • Blocking robots in real-time: Turing tests • Detecting later and removing from statistics
  • 7. Appropriate, but problematic methods for repositories • Excluding known robots by user-agent name – Easily faked or omitted • Excluding by IP address – DHCP, and list is growing exponentially • Usage pattern analysis: query rate and resources requested – Expensive to automate • Machine learning: training decision trees, neural nets and/or statistical systems – Did you say expensive??? • Combined approaches
  • 8. Effectiveness, and repository out-of-the- box repository strategies Strength Robots detected by Recall (%) Precision (%) No images requested 98.34 75.48 No referring site 96.27 52.25 List of IP addresses 69.29 99.40 HEAD method to access site 32.37 100.00 Agent name declared 26.56 100.00 Access only at night 24.48 50.43 Robots.txt file accessed 17.01 100.00 Time, σ (3s) 2.49 100.00 Time, average (1s) 2.49 75.00 DSpace uses IP addresses of known agents – much weaker than in the benchmarking study
  • 9. Effectiveness, and repository out-of-the- box repository strategies Strength Robots detected by Recall (%) Precision (%) No images requested 98.34 75.48 No referring site 96.27 52.25 List of IP addresses 69.29 99.40 HEAD method to access site 32.37 100.00 Agent name declared 26.56 100.00 Access only at night 24.48 50.43 Robots.txt file accessed 17.01 100.00 Time, σ (3s) 2.49 100.00 Time, average (1s) 2.49 75.00 Eprints filters based on number of hits from an IP address per day – similar to time based strategies in the benchmarking study
  • 10. Effectiveness, and repository out-of-the- box repository strategies Strength Robots detected by Recall (%) Precision (%) No images requested 98.34 75.48 No referring site 96.27 52.25 List of IP addresses 69.29 99.40 HEAD method to access site 32.37 100.00 Agent name declared 26.56 100.00 Access only at night 24.48 50.43 Robots.txt file accessed 17.01 100.00 Time, σ (3s) 2.49 100.00 Time, average (1s) 2.49 75.00
  • 11. Centralised strategy: IRUS-UK • Collects and filters statistics from 84 DSpace and Eprints repositories • COUNTER compliant usage statistics • Robot exclusion: – The COUNTER list of agent names – All downloads from IP addresses where there are more than 200 downloads in a day from a repository – Most downloads from IP addresses where there are more than 100 downloads in a day from a repository • Work commissioned to investigate feasibility and approach to adaptive filtering based on usage behaviour
  • 12. Sources by slide 1 Bill Gosper's Glider Gun in action—a variation of Conway's Game of Life. Johan G. Bontes. <https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#/media/File:Gosper s_glider_gun.gif> 3, 6, 7 Doran, D.; Gokhale, S.S. Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery (2011) 22:183-210. DOI:10.1007/s10618-010-0180-z 4 http://pixabay.com/static/uploads/photo/2015/05/31/12/09/wooden- 791421_640.jpg 5 Bad Robot Productions logo. 2001-2008. <https://en.wikipedia.org/wiki/Bad_Robot_Productions#/media/File:Bad_Robot_ Productions_logo.jpg> 6 Burroway, J., Loard, J. V. The Giant Jam Sandwich. 1972, Houghton Mifflin Harcourt. 8, 9, 10 Nick Geens, Johan Huysmans, Jan Vanthienen. Evaluation of Web Robot Discovery Techniques: A Benchmarking Study. Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. Lecture Notes in Computer Science 4065, pp 121-130, 2006. DOI:10.1007/11790853_10 8 Diggory, Mark. SOLR Statistics. DSpace Wiki. <https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics> 9 Joint, Nicholas. [EP-tech] Re: Please change the way IRstats works. Eprints_tech mailing list 2011-10-13 <http://www.eprints.org/tech.php/15695.html> 11 IRUS-UK. <http://www.irus.mimas.ac.uk/participants/>