1. WEB TECHNOLOGIES FOR LIBRARIES - 2 WORKSHOPS
June 28.-29. 2011 in Petrozavodsk
Web-scale discovery systems
Karen J. Buset, NTNU University Library
Trondheim, Norway
2. WHY
Google has set the standard for searching not only for our users, but
also for a lot of librarians.
Federated search was implemented at libraries in an attempt to
compete with Google/Google scholar
This failed because of the limitations of this technology:
• the small number of resources that could be searched
simultaneously
• the speed and the problems encountered merging results
• dealing with all the different and constantly changing interfaces.
3. WHAT
Content
• harvest content from local and remotely hosted repositories
• create a centralized index—to the article level
• suited for rapid search and retrieval of results ranked by
relevancy.
• harvesting of local library resources, combined with brokered
agreements with publishers and aggregators allowing access to
metadata and/or full-text content
Discovery
• single search box providing a Google-like search experience
Delivery
• quick results ranked by relevancy
• modern interface offering functionality such as faceted navigation
to drill down to more specific results
Flexibility
• agnostic to underlying systems,
• open compared to traditional library systems and allow a library
greater possibility to customize the services
4. HOW
Each vendor has agreements with several content suppliers from
whom they harvest materials. In addition, they harvest locally held
material such as existing library catalogues and institutional
repositories within the library using protocols such as OAI-PMH and
FTP.
Pre-harvesting eliminates the need to merge results as was the case
with federated search, which in turn makes de-duplication and
relevancy ranking easier.
Users can search all available metadata, but authentication is needed
to get access to full text. In this way, Google-like functionality is
provided to a delimited collection of resources.
5. PROBLEMS
The system vendors agree that there will still be a need for direct
access to specialised search interfaces because:
• Some resources are not indexed
• Some resources are not full-text indexed
• Some resources are not available
• Some databases might offer specialised search tools not available
in web-scale-discovery systems
6. SYSTEMS
«The big 4»
• OCLCs WorldCat Local
• ExLibris’ Primo
• Serials Solutions Summon
• EBSCO Discovery Services
Find links to system here:
https://sites.google.com/site/urd2comparison/home
7. USER OPINION
Several surveys at the library show that the users want a simple,
Google-like interface that will quickly provide them with relevant
results.
It seems probable that any of the these systems have enough
coverage that most users will be satisfied.
Relevance ranking in these systems cannot be compared with Google
pagerank; it might be a challenge to provide good relevance ranking in
a service aggregating such a diversity of metadata.
8. SOURCES
A brief overview of three web-scale-discovery systems: Summon,
Primo and OCLC Worldcat. NTNU University Library 2011
Jason Vaughan. Web Scale Discovery What and Why? Library
Technology reports. 2011